Friday, October 2, 2009

File-like objects

A Python file-like object can be seen as an abstraction to input/output streams. They are actually similar to Java's java.io.Reader/java.io.Writer abstraction (I'm coming from Java). The typical usage of file-like objects can be found in frameworks and APIs. The file-like object provides a generic way of reading/writing data and would fit an HTML-parser for example. The parser is not interested of where the data come from (files, sockets, etc), only the content, so naturally the implementation should not depend on how to open resources etc.

Basic methods that need to be implemented by a file-like object are:
  • read([size]) - read at most size bytes from the file.
  • readLine([size]) - read one entire line from the file.
  • write(str) - write a string to the file.
  • close() - close the file.
You should read the documentation of the function/method that uses the file-like object, often they specify which methods that are expected to be implemented by the file-like object. If the file-like object is only used for reading, no write-functions need to be supported. More information about file-like objects can be found here.

Ok, so why should you use file-like objects in your code?

Let's take a look at the ConfigParser class which can be used to read/write configuration data. You can expect to find methods for reading and writing configuration data using files but the class does also include support for file-like objects, which is great. For example it's very easy to write unittests for this class, thanks to the support of file-like objects,  without having to create real files with test data. To create a file-like object from a string we can use the StringIO module. The module have both read and write support.

Here's some code:
import unittest
import StringIO
import ConfigParser

class ConfigParserTest(unittest.TestCase):

    def testHasSection(self):
        """
        has_section() should return True if the section exists
        and False if is doesn't.
        """
        data = StringIO.StringIO("[My section]\nSetting=1\n")
        config = ConfigParser.ConfigParser()
        config.readfp(data)
        self.assertTrue(config.has_section("My section"))
        self.assertFalse(config.has_section("Not a section"))

    def testMissingHeader(self):
        """
        readfp() should raise MissingSectionHeaderError
        when parsing a file without section headers.
        """
        data = StringIO.StringIO("\nSetting=1\n")

        config = ConfigParser.ConfigParser()
        self.assertRaises(ConfigParser.MissingSectionHeaderError,
                          config.readfp, data)

if __name__ == "__main__":
    unittest.main()
And a simple write configuration test:
import unittest
import StringIO
import ConfigParser

class WriteConfigTest(unittest.TestCase):

    def testWriteConfig(self):
        """
        Test write a simple config
        """
        data = StringIO.StringIO()
        config = ConfigParser.ConfigParser()
        config.add_section("My Section")
        config.set("My Section", "my_option", "my_value")
        config.write(data)

        expected = "[My Section]\nmy_option = my_value\n\n"
        self.assertEquals(expected, data.getvalue())

if __name__ == "__main__":
    unittest.main() 
These unittests are of course only examples and do not fully test the ConfigParser class.
 
It's easy to see that file-like objects are very handy and fits well in frameworks and APIs. There are of course many advantages of using an abstraction when reading/writing data, some of them are:
  • Implementations do not depend on real files, socket, etc, only on a file-like object.
  • Easier to fake data in unittests.
  • Caller responsible of opening/closing the resource.
  • Less error-handling in framework code, delegate exceptions to the caller.
The downside with the current file-like object abstraction is that it's tailored for text I/O. In Python 2.6 and 3.x a new module have been added named io. The module originates from the New I/O PEP. The module enriches Python with an three-layer solution, the raw I/O layer, the buffered I/O layer and finally the text I/O layer. The raw and buffered layers deals with units of bytes and the text layer deals with units of characters. There are some things you should be aware of if you use the new I/O library in Python 2.6, read more about it here.

I've implemented a shoutcast playlist parser, plsparser, using the ConfigParser class. The parser is implemented as a Python generator function. Actually, I might do a post about generator functions someday.

No comments:

Post a Comment