(9 ratings)   
By: Patrick O'Brien
Persistence is all about keeping objects around, even between executions of a program. In this article you'll get a general understanding of various persistence mechanisms for Python objects, from relational databases to Python pickles and beyond....
Added: 02 June 2008    Views: 67  
PathComputers    Programming    Python
Keywords: computers   python   programming   code   coder   language   coding   object   data   type  
Do you like this tutorial? Now you can support our team to add more :     
 
 
 
Use serialization to store Python objects
Persistence is all about keeping objects around, even between executions of a program. In this article you'll get a general understanding of various persistence mechanisms for Python objects, from relational databases to Python pickles and beyond. You'll also take an in-depth look at Python's object serialization capabilities.

What is persistence?

The basic idea of persistence is fairly simple. Let's say you've got a Python program, perhaps to manage your daily to-do list, and you want to save your application objects (your to-do items) between uses of the program. In other words, you want to store your objects to disk and retrieve them later. That's persistence. To accomplish that goal you've got several options, each with advantages and disadvantages.

For example, you could store your object's data in some kind of formatted text file, such as a CSV file. Or you could use a relational database, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are well established, and Python has robust interfaces for all of these storage mechanisms.

One thing these storage mechanisms all have in common is that data is stored independent of the objects and programs that operate on the data. The benefit is that the data then becomes available as a shared resource for other applications. The drawback is that allowing access to an object's data in this way violates the object-oriented principle of encapsulation, in which an object's data should only be accessible through its own, public interface.

For some applications, then, the relational database approach may not be ideal. In particular, it's because relational databases do not understand objects. Instead, relational databases impose their own type system and their own data model of relations (tables), each containing a set of tuples (rows) made up of a fixed number of statically typed fields (columns). If the object model for your application doesn't translate easily into the relational model, you'll have quite a challenge mapping your objects to tuples and back again. This challenge is often referred to as an impedence-mismatch problem.


Object persistence

If you want to transparently store Python objects without losing their identity, type, etc., then you need some form of object serialization: a process that turns arbitrarily complex objects into textual or binary representations of those objects. Likewise, you must be able to restore the serialized form of an object back into an object that is the same as the original. In Python the serialization process is called pickling, and you can pickle/unpickle your objects to/from a string, a file on disk, or any file-like object. We'll look at pickling in detail later in this article.

Let's say you like the idea of keeping everything as an object and avoiding the overhead of translating objects into some kind of non-object based storage. Pickle files provide those benefits, but sometimes you need something more robust and scalable than simple pickle files. For example, pickling alone doesn't solve the problem of naming and locating the pickle files, nor does it support concurrent access to persistent objects. For those features you need to turn to something like ZODB, the Z object database for Python. ZODB is a robust, multi-user, object-oriented database system capable of storing and managing arbitrarily complex Python objects with transaction support and concurrency control. (See Resources to download ZODB.) Interestingly enough, even ZODB relies upon Python's native serialization capability, and to use ZODB effectively you must have a solid understanding of pickling.

Another interesting approach to the persistence problem, originally implemented in Java, is called Prevayler. (See Resources for a developerWorks article on Prevaylor.) A group of Python programmers recently ported Prevayler to Python and the result, called PyPerSyst, is hosted on SourceForge. (See Resources for a link to the PyPerSyst project.) The Prevayler/PyPerSyst concept also builds upon the native serialization capabilities of the Java and Python languages. PyPerSyst keeps an entire object system in memory, and provides disaster recovery by occasionally pickling a snapshot of the system to disk and by maintaining a log of commands that can be reapplied to the latest snapshot. While applications that use PyPerSyst are therefore limited by available RAM, the advantages are that a native object system completely loaded in memory is extremely fast and is much simpler to implement than one, such as ZODB, that allows for more objects than can be held in memory at once.

Now that we've briefly touched upon the various ways to store our persistent objects, it's time to examine the pickling process in detail. While our main interest is in exploring ways to persist Python objects without having to translate them into some other format, we are still left with various concerns, such as: how to effectively pickle and unpickle both simple and complex objects, including instances of custom classes; how to maintain object references, including circular and recursive references; and how to handle changes to class definitions without running into problems with previously pickled instances. We'll cover all of these issues in the following examination of Python's pickling capabilities.

About the Author :
Patrick O\'Brien is a Python programmer, consultant, and trainer. He is the author of PyCrust and a developer on the PythonCard project. He most recently lead the PyPerSyst team that ported Prevayler to Python, and continues to lead that project into interesting new territory.
 Rate this tutorial : Rate 1Rate 2Rate 3Rate 4Rate 5
  |    Add to Favorites
  |    Send to Friend
  |    Print
Comments