Reading Notes For Python Cookbook – Chapter 1 Data Structures and Algorithms

澳门新浦京娱乐游戏 2

居然是先将属于该用户的全部角色删除, 再添加一份新的进来, 完全无法接受,
反过来思考觉得肯定是自己的问题, 经过一番搜索 (Google), 发现
StackOverflow 上也有人问类似的问题, 并且最终在 NHibernate Tip: Use set
for many-to-many
associations 发现了解决方案,
将多对多的映射的 bag 改为用 set , 问题终于得到了解决,


You have multiple dictionaries or mappings that you want to logically
combine into a single mapping to perform certain operations, such as
looking up values or checking for the existence of keys.

19.5.3. Bags and lists are the most efficient inverse collections


You have two dictionaries and want to find what they might have in
common (same keys, same values, etc.).

最近在用 NHibernate 做多对多更新时突然发现 NHibernate 更新的策略很差,
对多对多关系的更新居然是先全部删除再插入全部数据, 感觉非常奇怪,


Your program has become an unreadable mess of hardcoded slice indices
and you want to clean it up.


1.14. Sorting Objects Without Native Comparison Support

Parent p = sess.Load(id);
Child c = new Child();
c.Parent = p;
p.Children.Add(c);  //no need to fetch the collection!


You have an N-element tuple or sequence that you would like to unpack
into a collection of N variables.

然而 bags 也不是一无是处:


import heapq

class PriorityQueue:
    def __init__(self):
        self._queue = []
        self._index = 0

    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1

    def pop(self):
        return heapq.heappop(self._queue)[-1]

在 NHibernate 参考文档的 19.5. Understanding Collection
performance 中这样描述:


Under the covers, a Counter is a dictionary that maps the items to the
number of occurrences. If you want to increment the count manually,
simply use addition:

morewords = ['why','are','you','not','looking','in','my','eyes']
for word in morewords:
    word_counts[word] += 1

Or, alternatively, you could use the update() method:


A little-known feature of Counter instances is that they can be easily
combined using various mathematical operations.

>>> a = Counter(words)
>>> b = Counter(morewords)
>>> a
Counter({'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2,
"you're": 1, "don't": 1, 'under': 1, 'not': 1})
>>> b
Counter({'eyes': 1, 'looking': 1, 'are': 1, 'in': 1, 'not': 1, 'you': 1,
'my': 1, 'why': 1})
>>> # Combine counts
>>> c = a + b
>>> c
Counter({'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2,
'around': 2, "you're": 1, "don't": 1, 'in': 1, 'why': 1,
'looking': 1, 'are': 1, 'under': 1, 'you': 1})
>>> # Subtract counts
>>> d = a - b
>>> d
Counter({'eyes': 7, 'the': 5, 'look': 4, 'into': 3, 'my': 2, 'around': 2,
"you're": 1, "don't": 1, 'under': 1})

Needless to say, Counter objects are a tremendously useful tool for
almost any kind of problem where you need to tabulate and count data.
You should prefer this over manually written solutions involving

19.5.2. Lists, maps, idbags and sets are the most efficient
collections to update


rows = [
    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
    {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
    {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
    {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}

from operator import itemgetter
rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_uid = sorted(rows, key=itemgetter('uid'))
rows_by_lfname = sorted(rows, key=itemgetter('lname','fname'))

不只是多对多, 如果你的集合需要更新, NHibernate 推荐的是

Python Cookbook – Recipes for mastering Python 3 – 3rd Edition

public class UserMapping : ClassMapping<User> {
    public UserMapping() {
        Id(m => m.Id, map => {
        Property(m => m.Name, map => {
            m => m.Roles,
            map => {
                map.Key(k => { k.Column("[UserId]"); });
            rel => {
                rel.ManyToMany(map => {

public class RoleMapping : ClassMapping<Role> {
    public RoleMapping() {
        Id(m => m.Id, map => {
        Property(m => m.Name, map => {
            m => m.Users,
            map => {
                map.Key(k => { k.Column("[RoleId]"); });
            rel => {
                rel.ManyToMany(map => {


If you are looking for the N smallest or largest items and N is small
compared to the overall size of the collection, these functions provide
superior performance.

The nlargest() and nsmallest() functions are most appropriate if you
are trying to to find a relatively small number of items. If you are
simply trying to find the single smallest or largest item (N=1), it is
faster to use min() and max(). Similarly, if N is about the same
size as the collection itself, it is usually faster to sort it first and
take a slice.

The implementation of a heap is an interesting and worthwhile subject
of study.

上面的代码是将用户的第一个角色删除, 再添加一个新的角色, NHibernate
生成的 SQL 语句如下(仅包含对关系表 User_Role 的操作):


If all you want to do is eliminate duplicates, it is often easy enough
to make a set.

>>> a = [1, 5, 2, 1, 9, 1, 5, 10]
>>> set(a)
{1, 2, 10, 5, 9}

However, this approach doesn’t preserve any kind of ordering. So, the
resulting data will be scrambled afterward. The solution shown avoids

The use of a generator function in this recipe reflects the fact that
you might want the function to be extremely general purpose–not
necessarily tied directly to list processing. For example, if you want
to read a file, eliminating duplicated lines, you could simply do this:

with open(somefile, 'r') as f:
    for line in dedupe(f):

Bags are the worst case. Since a bag permits duplicate element values
and has no index column, no primary key may be defined. NHibernate has
no way of distinguishing between duplicate rows. NHibernate resolves
this problem by completely removing (in a single DELETE) and
recreating the collection whenever it changes. This might be very

1.13. Sorting a List of Dictionaries by a Common Key

澳门新浦京娱乐游戏 ,Just before you ditch bags forever, there is a particular case in
which bags (and also lists) are much more performant than sets. For a
collection with inverse=”true” (the standard bidirectional one-to-many
relationship idiom, for example) we can add elements to a bag or list
without needing to initialize (fetch) the bag elements! This is
because IList.Add() must always succeed for a bag or IList (unlike an
ISet). This can make the following common code much faster.


In addition, you can map a slice onto a sequence of a specific size by
using its indices(size) method. This returns a tuple
(start, stop, step) where all values have been suitably limited to fit
within bounds (as to avoid IndexError exceptions when indexing).

>>> a = [5, 50, 2]
>>> s = 'HelloWorld'
>>> a.indices(len(s))
(5, 10, 2)

由此可见, bag 在多对多映射更新时性能较差,
如果不需要更新,则可以放心使用, 在需要更新时则 set 是更好的选择。

1.4. Finding the Largest or Smallest N Items

澳门新浦京娱乐游戏 1

1.15. Grouping Records Together Based on a Field

将 UserMapping 和 RoleMapping 中多对多映射全部改为 Set 之后,
上面的测试代码生成的 SQL 如下:


You have a sequence of items, and you’d like to determine the most
frequently occurring items in the sequence.

DELETE FROM [User_Role] WHERE [UserId] = @p0 AND [RoleId] = @p1;@p0 = 1 [Type: Int32 (0)], @p1 = 8 [Type: Int32 (0)]
INSERT INTO [User_Role]  ([UserId], [RoleId]) VALUES (@p0, @p1);@p0 = 1 [Type: Int32 (0)], @p1 = 9 [Type: Int32 (0)]


Now suppose you want to perform lookups where you have to check both
dictionaries (e.g., first checking in a and then in b if not found).
An easy way to do this is to use the ChainMap class from the
collections module.

a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

from collections import ChainMap
c = ChainMap(a,b)
print(c['x'])  # Outputs 1 (from a)
print(c['y'])  # Outputs 2 (from b)
print(c['z'])  # Outputs 3 (from a)
public class User {
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
    public virtual ICollection<Role> Roles { get; set; }
    public User() {
        Roles = new HashSet<Role>();

public class Role {
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
    public virtual ICollection<User> Users { get; set; }
    public Role() {
        Users = new HashSet<User>();


Much of what can be accomplished with a dictionary comprehension might
also be done by creating a sequence of tuples and passing them to the
dict() function.

p1 = dict((key, value) for key, value in prices.items() if value > 200)

However, the dictionary comprehension solution is a bit clearer and
actually runs quite a bit faster.

DELETE FROM [User_Role] WHERE [UserId] = @p0;@p0 = 1 [Type: Int32 (0)]
INSERT INTO [User_Role]  ([UserId], [RoleId]) VALUES (@p0, @p1);@p0 = 1 [Type: Int32 (0)], @p1 = 2 [Type: Int32 (0)]
INSERT INTO [User_Role]  ([UserId], [RoleId]) VALUES (@p0, @p1);@p0 = 1 [Type: Int32 (0)], @p1 = 7 [Type: Int32 (0)]
INSERT INTO [User_Role]  ([UserId], [RoleId]) VALUES (@p0, @p1);@p0 = 1 [Type: Int32 (0)], @p1 = 6 [Type: Int32 (0)]
INSERT INTO [User_Role]  ([UserId], [RoleId]) VALUES (@p0, @p1);@p0 = 1 [Type: Int32 (0)], @p1 = 10 [Type: Int32 (0)]


To easily construct such dictionaries, you can use defaultdict in the
collections module. A feature of defaultdict is that it
automatically initializes the first value so you can simply focus on
adding items.

from collections import defaultdict
d = defaultdict(list)

d = defaultdict(set)
using (var session = sessionFactory.OpenSession()) {
    var user = session.Query<User>().First();

    var firstRole = user.Roles.First();

    var roleCount = session.Query<Role>().Count();
    var role = new Role { Name = "Role " + (roleCount + 1) };




The solution involving zip() solves the problem by “inverting” the
dictionary into a sequence of (value, key) pairs. When performing
comparisons on such tuples, the value element is compared first,
followed by the key. This gives you exactly the behavior that you want
and allows reductions and sorting to be easily performed on the
dictionary content using a single statement.

    m => m.Roles,
    map => {
        map.Key(k => { k.Column("[UserId]"); });
    rel => {
        rel.ManyToMany(map => {


Instead of using lambda, an alternative approach is to use

即一个用户可以有多个角色, 一个角色也可以有多个人, 典型的多对多关系,


prices = {
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75

min_price = min(zip(prices.values(), prices.keys()))
# min_price is (10.75, 'FB')
max_price = max(zip(prices.values(), prices.keys()))
# max_price is (612.78, 'AAPL')

prices_sorted = sorted(zip(prices.values(), prices.keys()))
# prices_sorted is [(10.75, 'FB'), (37.2, 'HPQ'),
#                   (45.23, 'ACME'), (205.55, 'IBM'),
#                   (612.78, 'AAPL')]



The core of this recipe concerns the use of the heapq module. The
functions heapq.heappush() and heapq.heappop() insert and remove
items from a list _queue in a way such that the first item in the list
has the smallest priority. The heappop() method always returns the
“smallest” item, so that is the key to making the queue pop the correct
items. MoreOver, since the push and pop operations have O(logN)
complexity where N is the number of items in the heap, they are fairly
efficient even for fairly large values of N.

当向用户添加或删除角色是, 发现更新的效率特别低, 代码如下:


The choice of whether or not to use lambda or attrgetter() may be
one of personal preference. However, attrgetter() is often a tad bit
faster and also has the added feature of allowing multiple fields to be
extracted simultaneously.

1.20. Combining Multiple Mappings into a Single Mapping


Keeping a limited history is a perfect use for a collections.deque.
The name is pronounced “deck” and is short for “double-ended queue”.

from collections import deque

def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
        yield line, previous_lines

# Example use on a file
if __name__ == '__main__':
    with open('somefile.txt') as f:
        for line, prevlines in search(f, 'python', 5):
        for pline in prevlines:
        print(pline, end='')
        print(line, end='')


You have a list of dictionaries and you would like to sort the entries
according to one or more of the dictionary values.

1.16. Filtering Sequence Elements


You want to make a dictionary that maps keys to more than one value (a
so-called “multidict”).


The itemgetter() function creates a callable that accepts a single
item from rows as input and returns a value that will be used as the
basis for sorting.

The functionality of itemgetter() is sometimes replaced by lambda
expressions which often works just fine. However, the solution involving
itemgetter() typically runs a bit faster. Thus, you might prefer it if
performance is a concern.

Last, but not least, don’t forget that the technique shown in this
recipe can be applied to functions such as min() and max().


An OrderedDict internally maintains a doubly linked list that orders
the keys according to insertion order. When a new item is first created,
it is placed at the end of the list. Subsequent reassignment of an
existing key doesn’t change the order.

Be aware that the size of an OrderedDict is more than twice as large
as a normal dictionary due to the extra linked list that’s created.


The solution shows a subtle syntactic aspect of generator expressions
when supplied as the single argument to a function (i.e., you don’t need
repeated parentheses).

s = sum((x * x for x in nums))  # Pass generator-expr as argument
s = sum(x * x for x in nums)  # More elegant syntax

1.7. Keeping Dictionary in Order


Suppose you have some code that is pulling specific data out of a record
string with fixed fields.

######    0123456789012345678901234567890123456789012345678901234567890
record = '....................100          .......513.25     ..........'
cost = int(record[20:32]) * float(record[40:48])

Instead of doing that, why not name the slices like this?

SHARES = slice(20,32)
PRICE = slice(40,48)
cost = int(record[SHARES]) * float(record[PRICE])

In the latter version, you avoid having a lot of mysterious hardcoded
indices, and what you’re doing becomes much clearer.


In principle, constructing a multivalued dictionary is simple. However,
initialization of the first value can be messy if you try to do it
yourself. Using a defaultdict simply leads to much cleaner code.

1.9. Finding Commonalities in Two Dictionaries

Chapter 1 Data Structures and Algorithms

1.19. Transforming and Reducing Data at the Same Time


The easiest way to filter sequence data is often to use a list

>>> mylist = [1, 4, -5, 10, -7, 2, 3, -1]
>>> [n for n in mylist if n > 0]
[1, 4, 10, 2, 3]
>>> [n for n in mylist if n < 0]
[-5, -7, -1]

One potential downside of using a list comprehension is that it might
produce a large result if the original input is large. If this is a
concern, you can use generator expressions to produce the filtered
values iteratively.

>>> pos = (n for n in mylist if n > 0)
>>> pos
<generator object <genexpr> at 0x1006a0eb0>
>>> for x in pos:


This is easily accomplished by using a dictionary comprehension.

prices = {
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75
# Make a dictionary of all prices over 200
p1 = { key:value for key, value in prices.items() if value > 200 }
# Make a dictionary of tech stocks
tech_names = { 'AAPL', 'IBM', 'HPQ', 'MSFT' }
p2 = { key:value for key,value in prices.items() if key in tech_names }

1.11 Naming a Slice


The keys() method of a dictionary returns a keys-view object that
exposes the keys. A little-known feature of keys views is that they also
support common set of operations such as unions, intersections, and
differences. Thus, if you need perform common set operations with
dictionary keys, you can often just use the keys-view objects directly
without first converting them into a set.

The items() method of a dictionary returns an items-view object
consisting of (key,value) pairs. This object supports similar set
operations and can be used to perform operations such as finding out
which key-value pairs two dictionaries have in common.

The values() method of a dictionary does not support the set
operations described in this recipe.


You want to make a list of the largest or smallest N items in a


It is worth nothing that the phone_numbers variable will always be a
list, regardless of how many phone numbers are unpacked (including one).


You have code that accesses list or tuple elements by position, but this
make the code somewhat difficult to read at times. You’d also like to be
less dependent on position in the structure, by accessing the elements
by name.


You have inside of a sequence, and need to extract values or reduce the
sequence using some criteria.


The heapq module has two functions — nlargest() and nsmallest() —
that do exactly what you want.

import heapq
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums)) # Prints [42, 37, 23]
print(heapq.nsmallest(3, nums)) # Prints [-4, 1, 2]

portfolio = [
    {'name':'IBM', 'shares': 100, 'price': 91.1},
    {'name':'AAPL', 'shares': 50, 'price': 543.22},
    {'name':'FB', 'shares': 200, 'price': 21.09},
    {'name':'HPQ', 'shares': 35, 'price': 31.75},
    {'name':'YHOO', 'shares': 45, 'price': 16.35},
    {'name':'ACME', 'shares': 75, 'price': 115.65}
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])


You want to implement a queue that sorts items by a given priority and
always returns the item with the highest priority on each pop operation.


List comprehensions and generator expressions are often the easiest and
most straightforward ways to filter simple data. They also have the
added power to transform the data at the same time.

Another notable filtering tool is itertools.compress(), which takes an
iterable and an accompanying Boolean selector sequence as input. As
output, it gives you all of the items in the iterable where the
corresponding element in the selector is True. This can be useful if
you’re trying to apply the results of filtering one sequence to another
related sequence.

addresses = [
    '5412 N CLARK',
    '5148 N CLARK',
    '5800 E 58TH',
    '2122 N CLARK'
    '5645 N RAVENSWOOD',
    '1060 W ADDISON',
    '4801 N BROADWAY',
    '1039 W GRANVILLE',

counts = [0, 3, 10, 4, 1, 7, 6, 1]

>>> from itertools import compress
>>> more5 = [n > 5 for n in counts]
>>> more5
[False, False, True, False, False, True, True, False]
>>> list(compress(addresses, more5))
['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']


You have a sequence of dictionaries or instances and you want to iterate
over the data in groups based on the value of a particular field. such
as date.

澳门新浦京娱乐游戏 2


You want to create a dictionary, and yo also want to control the order
of items when iterating or serializing.


You want to keep a limited history of the last few items seen during
iteration or during some other kind of processing.


A very elegant way to combine a data reduction and a transformation is
to use a generator-expression argument.


Any sequence (or iterable) can be unpacked into variables using a simple
assignment operation.

name, shares, price, (year, mon, day) = [ 'ACME', 50, 91.1, (2012, 12, 21) ]


A ChainMap takes multiple mappings and makes them logically appear as
one. However, the mappings are not literally merged together. Instead, a
ChainMap simply keeps a list of the underlying mappings and redefines
common dictionary operations to scan the list.

If there are duplicate keys, the values from the first mapping get used.

Operations that mutate the mapping always affect the first mapping


To find out what the two dictionaries have in common, simply perform
common set operations using the keys() or items() methods.

a = { 'x' : 1, 'y' : 2, 'z' : 3 }
b = { 'w' : 10, 'x' : 11, 'y' : 2 }
# Find keys in common
a.keys() & b.keys()
# { 'x', 'y' }
# Find keys in a that are not in b
a.keys() - b.keys()
# { 'z' }
# Find (key,value) pairs in common
a.items() & b.items() # { ('y', 2) }

These kinds of operations can also be used to alter or filter dictionary
contents. For example, suppose you a want to make a new dictionary with
selected keys removed. Here is some sample code using a dictionary

# Make a new dictionary with certain keys removed
c = {key:a[key] for key in a.keys() - {'z', 'w'}}
# c is {'x': 1, 'y': 2}


You want to compare objects of the same class, but they don’t natively
support comparison operations.

1.17. Extracting a Subset of a Dictionary


The itertools.groupby() function is particularly useful for grouping
data together like this.

rows = [
    {'address': '5412 N CLARK', 'date': '07/01/2012'},
    {'address': '5148 N CLARK', 'date': '07/04/2012'},
    {'address': '5800 E 58TH', 'date': '07/02/2012'},
    {'address': '2122 N CLARK', 'date': '07/03/2012'},
    {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    {'address': '1060 W ADDISON', 'date': '07/02/2012'},
    {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},

from operator import itemgetter
from itertools import groupby
# Sort by the desired field first
# Iterate in groups
for date, items in groupby(rows, key=itemgetter('date')):
    for i in items:
        print('    ', i)


You need to unpack N elements from an iterable, but the iterable may be
longer than N elements, causing a “too many values to unpack” exception.


One possible use of a namedtuple is as a replacement for a dictionary,
which requires more space to store. Thus, if you are building large data
structures involving dictionaries, use of a namedtuple will be more
efficient. However, be aware that unlike a dictionary, a namedtuple is

>>> s = Stock('ACME', 100, 123.45)
>>> s
Stock(name='ACME', shares=100, price=123.45)
>>> s.shares = 75
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

If you need to change any of the attributes, it can be done using the
_replace() method of a namedtuple instance, which makes an entirely
new namedtuple with specified values replaced.

>>> s = s._replace(shares=75)
>>> s
Stock(name='ACME', shares=75, price=123.45)

A subtle use of _replace() method is that it can be a convenient way
to populate named tuples that have optional or missing fields. To do
this, you make a prototype tuple containing the default values and the
use _replace() to create new instances with values replaced.

from collections import namedtuple
Stock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])
# Create a prototype instance
stock_prototype = Stock('', 0, 0.0, None, None)
# Function to convert a dictionary to a Stock
def dict_to_stock(s):
    return stock_prototype._replace(**s)

Here is an example of how this code would work:

>>> a = {'name': 'ACME', 'shares': 100, 'price': 123.45}
>>> dict_to_stock(a)
Stock(name='ACME', shares=100, price=123.45, date=None, time=None)
>>> b = {'name': 'ACME', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}
>>> dict_to_stock(b)
Stock(name='ACME', shares=100, price=123.45, date='12/17/2012', time=None)

Last, but not least, it should be noted that if your goal is to define
an efficient data structure where you will be changing various instance
attributes, using namedtuple is not your best choice. Instead,
consider defining a class using __slots__ instead.


collections.namedtuple() provides these benefits. It is actually a
factory method that returns a subclass of the standard Python tuple
type. You feed it a type name, and the fields it should have, and it
returns a class that you can instantiate, passing in values for the
fields you’ve defined, and so no.

>>> from collections import namedtuple
>>> Subscriber = namedtuple('Subscriber', ['addr', 'joined'])
>>> sub = Subscriber('', '2012-10-19')
>>> sub
Subscriber(addr='', joined='2012-10-19')
>>> sub.addr
>>> sub.joined

Although an instance of a namedtuple looks like a normal class
instance, it is interchangeable with a tuple and supports all of the
usual operations such as indexing and unpacking.

A major use case for named tuples is decoupling your code from the
position of the elements it mainipulates.

To illustrate, here is some code using ordinary tuples:

def compute_cost(records):
    total = 0.0
    for rec in records:
        total += rec[1] * rec[2]
    return total

Here is a version that uses a namedtuple:

from collections import namedtuple
Stock = namedtuple('Stock', ['name', 'shares', 'price'])

def compute_cost(records):
    total = 0.0
    for rec in records:
        s = Stock(*rec)
        total += s.shares * s.price
    return total


You want to eliminate the duplicated values in a sequence, but preserve
the order of the remaining items.


If the values in the sequence are hashable, the problem can be easily
solved using a set and a generator.

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item

>>> a = [1, 5, 2, 1, 9, 1, 5, 10]
>>> list(dedupe(a))
[1, 5, 2, 9, 10]

This only works if the items in the sequence are hashable. If you are
trying to eliminate duplicates in a sequence of unhashable types (such
as dicts), you can make a slight chage to this recipe, as follows:

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item

>>> a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
>>> list(dedupe(a, key=lambda d: d['x']))
[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]


You want to make a dictionary that is a subset of another dictionary.

1.3. Keeping the Last N Items


You need to execute a reduction function, but first need to transform or
filter the data.


You want to perform various calculations (e.g., minimum value, maximum
value, sorting, etc.) on a dictionary of data.


Python “star expressions” can used to address this problem.

def drop_first_last(grades):
  first, *middle, last = grades
  return avg(middle)

record = ('Dave', '', '773-555-1212', '847-555-1212')
name, email, *phone_numbers = user_record


Unpacking actually works with any object that happens to be iterable,
not just tuples or lists. This includes strings, files, iterators, and
generators. When unpacking, python has no special syntax to discard
certain values, but you can often just pick throwaway variable name for

_, shares, price, _ = [ 'ACME', 50, 91.1, (2012, 12, 21) ]


The collections.Counter class is designed for just such a problem. It
even comes with a handy most_common() method that will give you the

words = [
    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
    'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
    'my', 'eyes', "you're", 'under'
from collections import Counter
word_counts = Counter(words)
top_three = word_counts.most_common(3)


To control the order of items in a dictionary, you can use OrderedDict
from collections module. It exactly preserves the original insertion
order of data when iterating.


Deques supports thread-save, memory efficient appends and pops from
either side of the deque.

Although you could manually perform such operation on a list (e.g.,
appending, deleting, etc.), the queue solution is far more elegant and
runs a lot faster.

Adding or popping items from either end of a queue has O(1) complexity.
This is unlike a list where inserting or removing items from the front
of the list is O(N).


The groupby() function works by scanning a sequence and finding
sequential “runs” of identical values (or values returned by the given
key function). On each iteration, it returns the value along with an
iterator that produces all of the items in a group with the same value.

An important preliminary step is sorting the data according to the field
of interest. Since groupby() only examines consecutive items, failing
to sort first won’t group the records as you want.

1.2. Unpacking Elements from Iterables of Arbitrary Length

1.12. Determining the Most Frequently Occurring Items in a Sequence

1.1. Unpacking a Sequence into Separate Variables

1.18. Mapping Names to Sequence Elements

1.5. Implementing a Priority Queue

1.10. Removing Duplicates from a Sequence while Maintaining Order

1.8. Calculating with Dictionaries

1.6. Mapping Keys to Multiple Values in a Dictionary

You can leave a response, or trackback from your own site.

Leave a Reply