June 22, 2021

Python Collections & Specialized Collection Data Types

Python Collections & Specialized Collection Data Types

Crash Course Into Python Part 2 [Click here for Part 1]

What Are Collections In Python?

Collections in Python are basically container data types — namely, lists, tuples, sets, and dictionaries. They have different characteristics based on usage. Here is a quick summary of what each container data types do.

Lists

  • declared in [] brackets
  • mutable — can change the values once you’ve declared the list
  • can store duplicate values
  • values accessible via indexes

Tuples

  • declared in () brackets
  • ordered and immutable — ie. cannot change a value inside a tuple once you’ve declared it
  • can have duplicate entries

Sets

  • declared in {} brackets
  • unordered and not indexed — ie. values not accessible
  • no duplicate entries

Dictionaries

  • declared in {} brackets
  • has key-value pairs
  • mutable — can change the values inside a dictionary

These are Python’s general-purpose built-in container types. However, Python always has something a little extra, which brings us to the collections module.

Here’s a quick cheatsheet summary for the above.

What Is A Collections Module In Python?

Python’s collections module is a specialized collection data structure that covers the shortcomings of the basic in-built collections data type. It’s an alternative to lists, tuples, sets, and dictionaries.

It is also good to note that because it is a module, you will have to import it using import. The following is a general summary of what the different specialized collection data structures can do.

namedtuple()

Named tuple returns a tuple with a named entry. This means that there will be a named assigned to each value inside a tuple.

For example, if we look at a tuple in general, it doesn’t have any named entries. It’s just a straight value assigned to a particular index.

animals = ('tiger', 'lion', 'seal')

With a named tuple, it becomes easier to access values because you’re relying on something that is more informative than just index numbers.

When using namedtuple(), you’ll need to set up the data structure first. Then you can add values to it.

from collections import namedtuple

animalList = namedtuple('animals', 'type, quantity, origin')
lions = animalList('lion', '5', 'Africa')
print(lions)

# will return:
# animals(type='lion', quantity='5', origin='Africa')

In the example above, we set up the data structure in animalList with namedtuple(). The first parameter takes the name of the tuple. The second parameter takes the name to be assigned to each tuple value.

lions uses the structure created in animalList and assigns the variables in the order as appeared in namedtuple() to give the result key-pair valued result.

You can also create a namedtuple using _make with a list. Here is the syntax example:

from collections import namedtuple

animalList = namedtuple('animals', 'type, quantity, origin')
lions = animalList._make(['lion','5','Africa'])
print(lions)

# will return: 
# animals(type='lion', quantity='5', origin='Africa')

deque

deque — pronounced ‘deck’ — is an optimized list that lets you easily insert and delete values. Here is an example of using deque.

from collections import deque

name = ['a','p','h','i','n','y','a']
d = deque(name)
print(d)

# will return:
# deque(['a', 'p', 'h', 'i', 'n', 'y', 'a'])

To add values at the end, you can use append(). Here is an example:

from collections import deque

name = ['a','p','h','i','n','y','a']
d = deque(name)
d.append('dechalert')
print(d)

# will return:
# deque(['a', 'p', 'h', 'i', 'n', 'y', 'a', 'dechalert'])

To put a value at the beginning of the deque, you can use appendleft(). Here is an example:

from collections import deque

name = ['a','p','h','i','n','y','a']
d = deque(name)
d.appendleft('dechalert')
print(d)

# will return:
# deque(['dechalert', 'a', 'p', 'h', 'i', 'n', 'y', 'a'])

To remove a value at the end of the deque, you can use pop(). Here is an example:

from collections import deque

name = ['a','p','h','i','n','y','a']
d = deque(name)
d.pop()
print(d)

# will return:
# deque(['a', 'p', 'h', 'i', 'n', 'y'])

To remove a value from the beginning of the deque, you can use popleft() . Here is an example:

from collections import deque

name = ['a','p','h','i','n','y','a']
d = deque(name)
d.popleft()
print(d)

# will return:
# deque(['p', 'h', 'i', 'n', 'y', 'a'])

On the surface, all this looks exactly the same as a normal list. However, deque really shines when you have a lot of data to deal with. On a performance level, it works much faster. For example, if you have 1000 items in a list, deque will be able to sort through it faster than using a normal list.

ChainMap

ChainMap is a dictionary like class for a single view of multiple mappings. In a nutshell, what this means is that it returns a list of several dictionaries. For example, we have two dictionaries with several key-value pairs. ChainMap will make a single view of both dictionaries in it.

Here is what it looks like in Python:

from collections import ChainMap

banana = {'name': 'banana', 'price': '1.99'}
apple = {'name': 'apple', 'price': '2.99', 'sale': 'false'}
fruits = ChainMap(banana,apple)
print(fruits)

# will return
# ChainMap({'name': 'banana', 'price': '1.99'}, {'name': 'apple', 'price': '2.99', 'sale': 'false'})

Counter

Counter is a dictionary subclass for counting hashable objects. What this means is that it will count all the different values in the list and return a dictionary. Here is an example:

from collections import Counter

fruits = ['apple','apple','banana','pear','banana','banana','blueberries','blueberries','apple','banana','pineapple','pear']
countFruit = Counter(fruits)
print(countFruit)

# will return
# Counter({'banana': 4, 'apple': 3, 'pear': 2, 'blueberries': 2, 'pineapple': 1})

What Counter() is basically doing is aggregating up your duplicates and returning counted values as a dictionary from a given list. Counter() works for any iterable object. This means that it will also work for tuples and sets.

Counter() also has three other operations you can perform on the passed in variable. The first is elements().

What elements() does is return all the variables inside the counter. Here is an example (continuing with the data from above):

print(list(countFruit.elements()))

# this will return:
# ['apple', 'apple', 'apple', 'banana', 'banana', 'banana', 'banana', 'pear', 'pear', 'blueberries', 'blueberries', 'pineapple']

What’s happening above is that we’ve put the results on countFruit.elements() inside a list so that we can easily see the results. We can easily replace list() with tuple() and set() , depending on your data needs.

Another operator you can use with Counter() is most_common(). This will return a sorted list.

print(countFruit.most_common())

# this will return:
# [('banana', 4), ('apple', 3), ('pear', 2), ('blueberries', 2), ('pineapple', 1)]

And finally, we have the subtract() function. This will subtract the value as per specified in the list. For example:

subtractFruit = {'banana':1, 'apple':2}
countFruit.subtract(subtractFruit)
print(countFruit.most_common())

# this will return:
# [('banana', 3), ('pear', 2), ('blueberries', 2), ('apple', 1), ('pineapple', 1)]

OrderedDict

OrderedDict is a dictionary subclass that remembers the order that the entries were added. Here is an example:

from collections import OrderedDict
price = OrderedDict()
price['banana'] = '1.99'
price['apple'] = '2.99'
price['pear'] = '3.99'
print(price)

# this will return:
# OrderedDict([('banana', '1.99'), ('apple', '2.99'), ('pear', '3.99')])

To get the keys from an ordered dictionary, you can use keys() . Here is an example:

print(price.keys())

# will return
# odict_keys(['banana', 'apple', 'pear'])

defaultdict

defaultdict is a dictionary subclass that calls a factory function to supply missing values. What this means is that it won’t throw any errors when a missing value is called in a dictionary.

It’s good to note that with defaultdict, you will need to specify the type.

from collections import defaultdict

prices = defaultdict(int)
prices['banana'] = '1.99'
prices['apple'] = '2.99'
prices['pear'] = '3.99'
print(prices)

# this will return
# defaultdict(<class 'int'>, {'banana': '1.99', 'apple': '2.99', 'pear': '3.99'})

print(prices['peach'])

# this will return
# 0

If you call a non-existing key in a normal dictionary, you will key a KeyError.

And that’s basically it for this piece. Stay tuned for the next set of Python learning notes.