EntityFS architecture overview

Karl Gustafsson

Holocene Software

Revision History
Revision 1.0	2008-01-27

Table of Contents

Introduction
File systems and entities
Locking and multi-threaded access
Entity views
Utility classes
Implementation
Further reading

Introduction

The file system, just like the relational database, is a powerful abstraction for representing structured information. But file system support in Java today consists basically of what's in the java.io.File class – low-level methods for working with path names and for opening streams. When you create a File instance, it represents the file system path to an entity (file or directory); not the entity itself. This means that the File object is not a file object; it does not represent what we normally mean when we talk of a file, it might even represent the path to a directory. Many File instances may reference the same entity, and File instances may also reference non-existing entities.

In Java, it is not obvious how data containers (i.e. files) should best be represented in programs and API:s. There are several possible options. A file can be represented by a File object, but then it has to be an actual file in a file system. It can be represented by an open InputStream, but then it can only be read from once and, perhaps worse, if the stream is passed as a method parameter, the responsibility for closing the stream often has to be with another method than the one opening it. If the file contents is represented by a String object, the entire contents of the file must be stored in memory while it is used. In other words: Java is lacking a backend-independent file interface.

EntityFS is a software package that attempts to remedy that, and much more. EntityFS helps programmers to work with file systems and their entities – files and directories – in an object-oriented fashion. It provides a large set of tools for manipulating entities, for observing entities and for iterating over entities. The interfaces that client programs use are backend-independent; there are several different file system backend implementations, for instance the RAM memory file system and the Zip file file system. The client program does not have to be hard-wired to any specific implementation.

Entity objects in EntityFS represent the file system entities themselves, not their locations. An entity is represented by the same entity object instance even after the entity has been moved or renamed. The file system implementation ensures that there is only one entity object instance representing the same entity at any given time. Changes made to the entity by one thread is immediately visible to all other threads working with the same entity. EntityFS entities are true entity objects.

File systems and entities

Entities exist within the context of a file system object. An entity object lives its entire life within one and the same file system. It cannot be moved to another file system, only copied. A client gets references to entities through other entities, for instance by getting all child entities of a directory or getting a file's parent directory, or through a file system object, which always has a reference to its root directory. The client never instantiates any entity objects.

There are two standard types of entities, files and directories (capability providers may add more). EFile, the file entity interface, extends the ReadableFile, WritableFile and RandomlyAccessibleFile interfaces. They represent different aspects of the file as a data container. They are easy to mock for testing and can be used wherever the entity properties of a file are not needed. Analogously, Directory implements the EntityHolder and EntityListable interfaces. See below for a graph over the entity object inheritance hierarchies.

Every entity in a file system has a unique absolute location that represents its location within the file system, its absolute. path. It consists of a list of parent directory names with each name separated by forward slashes, for instance /myDir/mySubDir/myFile. The file system's root directory, its only parentless entity, has the absolute location /. Different entities' locations relative to directories within a file system are represented by relative locations (relative paths).

Entities and the entire file system are Observable for events, such as the EntityModifiedEvent and the ChildEntityRenamedEvent.

A file system may use an AccessController to implement conditional access to entities.

The file system is created using a file system type-specific file system builder. A file system object has a set of file system-global properties that govern its behavior, for instance its locking strategy (see below). Those properties are set when the file system is created and cannot be modified after that.

Additional functionality can be added to a file system when creating it by adding new entity and/or file system capabilities. Some capabilities are file system implementation-specific, Ram file system symbolic links, for instance, while some other capabilities can be used by all or most file system implementations (GZip compression of file data). A file system implementation might also have capabilities of its own. The file system-backed file system implementation, for instance, is file-resolvable and persistent.

Locking and multi-threaded access

EntityFS file systems are designed from the ground-up to be used concurrently by several threads. Every file system instance has a configurable locking policy for entities. For read/write file systems, the default locking policy is that entities have to be locked for writing by updating threads and locked for reading by reading threads. The read/write locks have the same semantics as Java's ReadWriteLock.

Every application is responsible for implementing its own strategy for in which order entity locks are acquired. The entity objects themselves do not assume any particular strategy.

For single-threaded usage scenarios or for read only file systems, entity locking can be disabled altogether.

Entity views

Some entity types, directories for instance, are view capable. They support entity views that use filters to hide certain child entities. A view can for instance be used to show only files with the extension .xml in a directory. Views can also be nested.

When working with views, view capable entities returned from view methods inherit the settings of the current view. For instance, when calling listEntities on a view of a directory containing both files and directories, the directories returned will be views with the same view settings (same entity filter instances) as the current view, but the files returned will be file entity objects since they are not view capable. See the programmer's guide for examples.

Utility classes

Entity object interfaces are designed to be as small as possible. They are augmented by utility classes that provide more, and perhaps more programmer-friendly, methods than what the entity objects do themselves. Compare this with the static methods in the Collections class.

Care has been taken to ensure that the static methods do not implement any policy on how entity objects should work; they are only tools for working with entities. The only violation of that rule is that the utility classes use a top-down, hash code order entity locking strategy. If that is not appropriate for your application, use the entity objects directly instead.

Utility classes include Files for working with files and Directories for working with directories.

Implementation

Figure 1. EntityFS implementation layer overview

After selecting which backend to use, client applications work against the EntityFS API:s and the EntityFS capability API:s. The EntityFS implementation is generic and backend-independent. Capabilities plug in to add more functionality. A capability may also plug in to any backend adapter implementation, making the capability backend-specific.