Saturday, January 30, 2010

Named tuple

I recently read a post on planet python and the author mentioned something about named tuple which made my curious.

So what's a named tuple?
The namedtuple was introduced in Python 2.6. A named tuple is created using a factory function from the collections module and it extends the basic tuple by assigning a name to each position in a tuple but can still be used as a regular tuple. This makes it possible to access fields by name instead of an index. The named tuple should not require more memory, according to the documentation, than regular tuples since they don't have a per instance dictionary.

The factory function signature is:
collections.namedtuple(typename, field_names[, verbose])
The first argument specifies the name of the new type, the second argument is a string (space or comma separated) containing the field names and finally if verbose is true the factory function will also print the class generated.

Enough theory, I'll show you an example.

Example
Say you have a tuple containing username and password. To access the username you get the item at position zero and the password is accessed at position one:
credential = ('mario', 'secret')
print 'Username:', credential[0]
print 'Password:', credential[1]
There's nothing wrong with this code but the tuple isn't self-documented. You have to find and read the documentation about the positioning of the fields in the tuple. This is where named tuple can enter the scene. We can rewrite the previous example as following:
import collections
# Create a new sub-tuple named Credential
Credential = collections.namedtuple('Credential', 'username, password')

credential = Credential(username='mario', password='secret')

print 'Username:', credential.username
print 'Password:', credential.password
Nice, don't you agree?

If you are interested of what the code looks like for the newly created Credential-type you can add verbose=True to the argument list when creating the type, in this particular case we get the following output:
import collections
Credential = collections.namedtuple('Credential', 'username, password', verbose=True)

class Credential(tuple):                                     
        'Credential(username, password)'                     

        __slots__ = () 

        _fields = ('username', 'password') 

        def __new__(_cls, username, password):
            return _tuple.__new__(_cls, (username, password)) 

        @classmethod
        def _make(cls, iterable, new=tuple.__new__, len=len):
            'Make a new Credential object from a sequence or iterable'
            result = new(cls, iterable)                               
            if len(result) != 2:                                      
                raise TypeError('Expected 2 arguments, got %d' % len(result))
            return result

        def __repr__(self):
            return 'Credential(username=%r, password=%r)' % self

        def _asdict(t):
            'Return a new dict which maps field names to their values'
            return {'username': t[0], 'password': t[1]}

        def _replace(_self, **kwds):
            'Return a new Credential object replacing specified fields with new values'
            result = _self._make(map(kwds.pop, ('username', 'password'), _self))
            if kwds:
                raise ValueError('Got unexpected field names: %r' % kwds.keys())
            return result

        def __getnewargs__(self):
            return tuple(self)

        username = _property(_itemgetter(0))
        password = _property(_itemgetter(1))
The named tuple doesn't only provide access to fields by name but also contains helper functions such as the _make() function which helps creating an Credential instance from a sequence or iterable. For example:
cred_tuple = ('mario', 'secret')
credential = Credential._make(cred_tuple)
There are more interesting use-cases and examples in the documentation, so I suggest that you take a peek.

Comments
I think the named tuple is useful. They remove the error-prone indexing in tuples by providing access to fields by name without adding any memory overhead. They are also regular Python classes which means you can do anything you can do with classes.

8 comments:

  1. Keep in mind that named tuples are significatly slower than normal tuples (rougly 2-3 times slower)

    ReplyDelete
  2. Not to mention the namedtuple implementation is a dirty ugly hack.

    ReplyDelete
  3. Thanks for the comments. I haven't used the named tuple for anything yet, just seemed as a good thing.

    Are there any other options that are less hacky and faster?

    ReplyDelete
  4. Usually, I make "Data Transfer Objects" with something like this:

    class Dto(object):
      def _init__(self, **kw):
        self.__dict_.update(kw)

    dto = Dto(username='mario', password='secret')
    print dto.username, dto.password

    ReplyDelete
  5. It looks like a dictionary to me......

    What's the pros and cons of namedtuple vs dictionary?

    ReplyDelete
  6. Well, you can still use the namedtuple as an ordinary tuple, for exmaple in for-statements and so forth. You only add a "layer" on top of the tuple and can use that to fetch items from the tuple 'by name' instead of index.

    If you use a dict you have to call, for example, mydict.values() in a for-statement which returns a copy of the dictionary’s list of values.

    ReplyDelete
  7. Ahhh .... now that make sense :)
    thanks for the clarification :)

    ReplyDelete
  8. Also, a namedtuple avoids the overhead having store the attribute names in every instance of the named tuple. You would have that overhead if you stuffed the data into a data dictionary or Data transfer objects the prior poster suggested (which are really, just another syntax for a data dictionary, with the storage backed by the object's instance's __dict__. Named tuples are objects allocated with __slots__ and consequently are highly memory efficient. There is a reason they are used all over the python standard library.

    ReplyDelete