Shape Thread on Software Methodologies
I’m on the SHAPE forum for software engineering management, and brought up the question about which methodology is the most appropriate for which situation.
I brought up three situations, and then realized there weren’t any factors I could weight for a heuristic. So I made them up, starting from a list I already had on factors that go into estimating software projects.
In the process, I’ve also made the attempt to list every commonly used software methodology and/or process framework. I’m certain I’ve missed some (such as ICONIX) but at least this is a good start.
Different people made different recommendations. The reasons why they made them was at least as interesting as the decisions they made.

I’ve also found a couple of books, notably Balancing Agility and Discipline which deals explicitly with people who are on the fence when it comes to software methodologies. I’ll come back and update this post as I get a better picture.
Importing from Excel 3
For the last project I had, we depended on data from the business users. There wasn’t very much data, but it was very variable and being revised in an excel spreadsheet on a daily basis. Not only that, but it was changing in a structural basis as well – new rows and columns would be added, links between workbooks were used. We couldn’t use a snapshot, because the snapshot would be obsolete in short order, and the business users couldn’t use a database that would restrict them.
In the past, the business users had transcribed the excel data that they had one row at a time into the database through a web or Swing UI. We needed something faster this time around. We needed a way to import the excel spreadsheet directly.
This idea had appeal on an intuitive level. The business users could work in the structure they were most experienced at, and we could get fresh data almost as soon as they’d revised it. We knew that there were some data validation issues where values had to be restricted to a list, but we thought that Excel’s validation features would cope with that.
I looked into reading Excel in its XML dialect, but quickly discarded the idea after eyeballing the XML Excel generated itself. Instead, I looked for a library which would do the work of conversion for me, and present an easy to use API.
There was only one freely-available Java library that handled Excel spreadsheets, and that was POI. POI is a library written specifically to handle Microsoft Office formats, and their solution for Excel was called HSSF (Horrible SpreadSheet Format).
On the database side of things, we had an advantage in that we were using ATG’s persistence solution, the SQL repository (or Data Anywhere Architecture, as marketing calls it). Because the SQL repository can be fed data imported through XML, we could convert the Excel spreadsheet to a file in XML format, and then run startSQLRepository to import data.
The documentation for HSSF was eclectic, but servicable. Once I’d found the document to read an Excel document, I could iterate through the rows, and call row.getCellAt(0) to get the first cell in the row.
Once I’d done this, I copied the data from the row to a JavaBean object, passed a list of those object to the export class, and generated XML from them.
There were a number of issues that popped up in implementing this solution. The first one was that POI is abandonware. The last update was from 2004. Using Excel XP with some features (validation of specific cells, for example) cause NullPointerExceptions when it tries to parse it. The solution here was to not use those features.
Another issue was that the Excel data was linked between different workbooks, but was not linked tightly together. Because all the fields were text, any text that was not specifically validated had to match between workbooks exactly. Identifying data mismatches between workbooks was a time consuming process, and I didn’t think of a good way to automate it.
Finally, as new data was added, the columns would be moved around or added to as the business users tweaked it. Sometimes I would be notified of this change, and sometimes I would find out by having the script fail.
Importing data directly from an Excel spreadsheet is a good solution in many circumstances. However, it’s probably a good idea to have a ‘business’ that can be quickly modified, and a ‘data’ spreadsheet with immutable columns specifically for data export. Given a static schema, it’s easy to use and extend, and is much easier to modify than tweaking XML directly.
Autogenerated Car Manager
Okay. I asked earlier about the benefits of a strongly typed manager class. The general consensus was that:
1) Weakly typed repository items (where you have to call getPropertyValue(“price”) and cast to the appropriate type) suck.
2) Strongly typed objects (where there’s something like getPrice() that has more logic and intelligence associated with it) do not suck, but the code involved with managing complex objects can get so insanely complicated that the possiblity of bugs is actually higher than just using repository items.
3) In addition, using strong type can involve a lot of boring manual typing unless a tool is used to autogenerate the type wrappers.
From this input, I can think of some good use cases for strongly typed patterns:
1) Where the expectations for the manager are well defined. In this circumstance, there is only one thing that the manager can do and it is not expected to do anything else. Relationships between items are not expected to change and the interface is effectively frozen.
2) The manager is not frozen and may be extended, but the interface is controlled by an in-house team of engineers. In this case, the manager is not bound to an external framework, and the engineers can modify the manager to fit their needs exactly.
3) The manager is expected to be extended, and will be used as part of an external framework. However, the engineers designing the manager are all geniuses, have designed it to be easy to extend, and can cover every possible use case.
However, anything that can ease the pain of creating type-safe wrappers is a plus, so I’ve poked through the “repository-to-java” code and updated the Car project to use it. The ant build file will run out of the box and I’ve included comments on what worked and what didn’t.
Maintaining a seperation of concerns between the view and the manager is still important to me, and so I’ve kept the Car interface around even though strictly speaking I could use the autogenerated CarFacade interface. It’s easy enough to copy the method definitions from one class to the other.
The only thing that bugs me about the current solution is the EJB implementation methods and exceptions. I really only want to see CarExceptions, and I don’t like it when implementation details leak.
Anyway. The autogenerated code works perfectly, even though there’s a few bits I still don’t understand (like the role of CarFacadeHome and the method delegation through the wrappers.) I’ve marked those bits out so people can possibly use them for something. The configuration.xml file is complex, but the DTD goes a long way to making it understandable. See what you think.
Car project 3
Well, I was going to write a big response to Repository Patterns. Somewhere along the line, it turned into its own project. Read it here.
It’s pretty good. It’s got the main concepts down, plus it’s got OperationDemarcation. OperationDemarcation is the new bacon. I wish I’d thought of it years ago. Plus a few utility classes like UserMessages, BeanUtils, and real unit tests instead of the toy examples on ATG mockobjects.
I would like to give a shout out to Enterprise Architect. EA has done the impossible: it makes creating UML diagrams fun. It’s also at least 25 times cheaper than Together/J or Rational Rose. And it doesn’t crash like Rational Rose either.
The javadoc comments provide more detail about the purpose of each class. Note that the benefit of the manager class is not that you can replace the repository entirely, but that you have the flexibility to decide when and how to use it. You might want to cache some data, make some remote calls somewhere else, consult an authentication service… who knows. Putting a layer in from the beginning ensures that any changes to the repository don’t have to propagate to other areas of the application.
Let me know what you think, and I’ll update the project as needed.
Repository Patterns 6
For some years, I’ve been using a pattern to abstract away access to a repository and make it more manageable. Here’s how I do it. I understand that this looks like (and probably is) DAO, but not everyone has heard of it, and I think that a concrete example is more helpful than a three letter acronym.
So. You have some custom information that you need to store in a repository. You know that you’re going to need at least one new item-descriptor, and you’re going to have to expose that data to the rest of the system.
The item descriptor looks like this:
<item-descriptor name="car"> <property name="displayName" column-name="name" data-type="string"/> <property name="color" column-name="color" data-type="string"/> <property name="price" column-name="price" data-type="double"/> </item-descriptor>
And you want to be able to search for cars, create new cars, modify existing ones, etc.
First thing to do is to create a JavaBean for the item descriptor:
public class Car {
protected String mId;
protected String mDisplayName;
protected String mColor;
protected Double mPrice;
public Car() {
}
// get and set methods
}
Note that the variables are protected rather than private. I do this because someone after me may need access to those variables in a subclass. I don’t advocate it, but it’s up to them to make the call if it’s necessary. There is also no link directly to the repository item.
And then create a manager which interacts with these Car objects, and throws exceptions on the same level.
public interface CarManager {
public Car getCar(String pId) throws CarException;
public void deleteCar(String pId) throws CarException;
public void updateCar(String pId, Car pCar) throws CarException;
// Known searches
public Car[] getCarsByColor(String pColor) throw CarException;
public Car[] getCarsByPrice(Double pPrice) throw CarException;
}
One advantage of explicitly defining an interface is that clients can write against the interface even when the implementation is still incomplete. This means that you can write a mock object that implements that interface and hook your business logic to it for unit tests.
After defining the interface, you create the manager to translate between repository items and domain objects:
public class CarManagerImpl implements CarManager
{
public Car getCar(String pId) throws CarException
{
if (isLoggingDebug())
{
String msg = "getCar: pId = {0}";
Object[] params = { pId };
msg = MessageFormat.format(msg, params);
logDebug(msg);
}
if (StringUtils.isEmpty(pId))
{
throw new CarException("null pId");
}
String repositoryId = pId;
try
{
boolean rollback = true;
TransactionManager tm = getTransactionManager();
TransactionDemarcation td = new TransactionDemarcation();
// Make sure that a transaction exists before we do anything.
td.begin(tm, TransactionDemarcation.MANDATORY);
try
{
Repository r = getRepository();
RepositoryItem item = r.getItem(repositoryId, CAR_ITEM_DESC_NAME);
if (item == null)
{
return null;
}
Car car = new Car();
car.setId(repositoryId);
copyProperties(item, car);
rollback = false;
return car;
} finally
{
td.end(rollback);
}
} catch (RepositoryException re)
{
throw new CarException(re);
} catch (TransactionDemarcationException tde)
{
throw new CarException(tde);
}
}
}
There’s a number of things going on in the code above. First, the manager takes responsibility for logging debug information. Yes, you can turn on debugging in an item-descriptor directly, but that will spit out debugs for any access of that item-descriptor. Logging on the manager level allows for more direct and customized control.
Second, the manager handles transaction management. It may be the case that a transaction already exists before this method is called, but that’s not the important bit. The important bit is that an existing transaction gets rolled back if this method fails. I can’t stress this enough: almost nothing is as frustrating as a method that both throws an exception but still commits bad data to the repository.
Third, the manager does not use RepositoryException. Instead, exceptions are nested and thrown so that the client may determine how to deal with the error. I used to explicitly log errors in the manager, but that got tedious as the applications scaled up and every single component logged the same error at different levels. So now I have a simple rule: if you catch an exception and don’t rethrow it, you are responsible for logging it. This typically means that the form handler or droplet at the UI end of the chain catches the exception, logs it to console and adds a form exception for user display.
So this handles the basic case. Let’s see what happens with modification:
public void updateCar(String pId, Car pCar) throws CarException {
// logging code
// Check input for nulls
// transaction code wrapper
MutableRepository mr = (MutableRepository) getRepository();
MutableRepositoryItem mutItem = mr.getItemForUpdate(pId, CAR);
// Copies all the public properties from pCar to the mutable repository item
copyProperties(pCar, mutItem);
mr.updateItem(mutItem);
}
Much like you’d expect, except for a couple of points: I explicitly use an id for the update. I could put the id in the Car object, but that would get confusing as we are using the Car as a bag of data here for application, not query. The “copyProperties” code is actually not hard: you can leverage the DynamicBeans API with an array of public properties to get and set property values from one to the other. However, this is a blind copy. You can’t selectively change a property value with this method. Usually this isn’t a problem (in forms which display all properties at once), but it’s always possible to use a key-value approach or allow for more selective updates.
The advantage of updateCar(Car) is that the Car can contain as much data as needed. If you do updateCar(String pId, String pColor, Double pPrice), then you have to change the interface every time you add a property to the item-descriptor.
Finally, there’s the searching. RQL statements are very useful in this context, as they can be defined in the component properties, and it’s much less work to get data in and out of them.
public Car[] getCarsByColor(String pColor) throws CarException {
// logging debug code
// Check input for nulls
// assume this happens in a transaction code wrapper..
// this is set in the properties file as “carsByColor=color = ?0”
RqlStatement carsByColor = getCarsByColor();
Object[] params = { pColor };
Repository r = getRepository();
RelationalView view = r.getView(CAR);
RepositoryItem[] items = carsByColor.executeQuery(view, params);
// always pass a zero length array, as this makes client access less fiddly…
if (items == null) {
if (isLoggingDebug()) { ... }
return new Car[0];
}
Car[] cars = convertItemsToCars(items);
return cars;
}
There are a number of complexities that can arise with this pattern. I don’t cover what happens when you have items that reference other repository items. I don’t cover the memory bloat that this pattern can cause with large repositories. And I don’t cover the ‘stale data’ problem that happens if you keep references to the Car object. All of these problems are solvable, but the best solution depends on the circumstances.
Organizing by schema 1
If you have an enterprise level website, then there are several sections to it.
There’s the user data. Shipping addresses, billing addresses, customer preferences and passwords.
There’s the order data. This consists of credit card data (associated with an order’s shipping address), shipping information, handling instructions, line items and associated prices, taxes, etc.
There’s the catalog data. This includes the product, category and SKU information, as well as the templates used to display all of the above on the website, and the related media.
There’s the promotion data. This covers stuff like “two for one deals”, “15% off the items if you buy three” and the like. There are also store coupons which are available.
Finally, there’s all the miscellaneous data (search data, company jobs, store locations and administration functions).
So there are many, many tables that go into the making of a website. Sorting out all these tables can be a real headache.
Something that Dan Brandt showed me was the idea of organizing tables by their schema. The first time I saw this, I didn’t see the point. They’re all tables. Most of them are related in some way. Why complicate things by putting different tables into different schema?
But Dan was right. Schema are the closest thing to namespaces in the database world. They allow different types of data to have database settings (rollback segments, tablespace size, etc) to be assigned, because they grow at different rates. They allow tables to be swapped out, because you can point from one schema to another. They allow for better security, because you can apply the same security roles to everything in a schema.
And finally, they allow for a clear separation of concerns. You don’t have to filter out all the order tables from the profile schema because it’s simply not there.
The downside is that you have to be more clever about how to arrange your DDL scripts. This took some fiddling around with Ant, but was well worth it.
We didn’t want to hardcode our schema names (at one point in development they were changing every week) and so we used an ant copy task with a filter which took the ${order.schema.name} and replaced with the appropriate ant property value when deploying the application.
This change fits into a larger ant build process that was in wide use at ATG for a number of years. Create and delete scripts are stored in specific directories and are run in order as a list. We also took an snapshot of the repositories using startSQLRepository, and dumped all the data out to XML files that we could use later on the newly created tables. The end result of this work was that we could completely destroy, recreate and value the entire database with a single Ant command as part of a daily build.
Missing The Point 3
I haven’t written much about programming lately.
The main reason is because every time I start writing, I feel that I’m Missing The Point. I don’t know what point I’m missing, but I have the sense that it’s obvious. Or it should be. And writing about a code optimization technique is just a pointless hack.
I think part of it could be that I’m just getting old. I remember when CORBA came out, I was so excited. And then EJB (clearly more of an answer to CORBA than it was to any kind of persistence solution). And now there’s SOAP. The common factor in all of these technologies is that they’re not simple, and the implementations are hard to get right. And then they don’t work with each other. The users complain back to the vendors, who go back to the spec committee. A new spec is written, but there are already fixes put in place which must be worked around. Versioning becomes an issue. Newer specs try to define how to deal with older versions of the spec. And so on. The organizations which avoided this problem did so by defining one implementation. If you use X’s RPC model or RMI (or most Apple stuff), you only use one implementation, so it All Just Works. But every interoperable system has to deal with incredible amounts of crap, and generally speaking they don’t do a good job of shielding that crap from the user.
Not only that, but these systems typically don’t realize when they don’t work. There’s no way to open the hood up on your typical implementation and disable or custom write a small section of code. In most cases, the public API of the system will be well documented and well written, but in my experience the internal interfaces of most implementations (especially the first few versions) are confused, and even mislabeled. These systems don’t fail gracefully, and they can’t be fixed.
And really, fixing bugs is what I spend most of my time doing. I like to think of myself as a programmer, but maybe a more accurate term would be “bug-avoider.” I can count out my hours in terms of bugs avoided while writing, and bugs fixed after writing. When I first started coding, I ran into every single damn bug head on, and I had no ability to see bugs on the page. I stared at the code, but I still couldn’t see bugs. I knew there were bugs in there. But I was missing something I was staring right at. So I thought “if I’m always going to write bugs into my code, I may as well write code so it’s easy to take the bugs out.”
And that’s the problem here. There’s no tolerance for failure built into the specifications. There’s no acknowledgement in the implementations that parts of the spec will have been misunderstood and may need to be tweaked by the user. There’s no lower level debugging API available, no diagnostics and no logging. It’s not that the implementations are at fault – no-one said that they had to do this stuff. It’s the specification’s fault for not defining a diagnostic API along with the main spec.
This is something the industry does all the time. It consistently gives users what they ask for, instead of what they want. Products are measured by their features instead of by their bug-avoidance factor. But I’d choose a more limited product with a solid test suite and solid logging and diagnostics over a glossy black box with no access to the internals any day.
I wonder sometimes if I’m living in the same industry that I read about in the periodicals and on the blogs. My problems are not going to be fixed by a new scripting language or programming trick. They are fun to read, but they are not on the critical path that helps me get my job done. And I almost never hear about bug removal techniques, theorem provers or the importance of writing maintainable code. Or a zen discussion on what “maintainence” really is, or how “maintainence-nature” can be revealed. I want more books like TCP/IP Illustrated: The Implementation and Tex: The Program. I want a book about how to recognize broken code and another book about how to fix it.
That’s part of the point, but it’s not all of it. I feel like the answers I’m looking for are just shadows cast by a larger question. I’m going to the AYE conference tomorrow in hopes of enlightenment, but until I can define the question properly, I’m going to be one confused geek.
Program Verification 3
I’ve liked the idea of program verification since I first saw it, way back in Z. Program Verification has been around since Hoare’s paper in 1969, but it’s never really caught on. I suspect part of the reason is that it was expressed through things like Z.
This is a pity, because verification such a simple, beautiful idea that it’s popped up in multiple different ways, from Design by Contract to Defensive Programming to Unit Testing to Assertions to Static Code Analysis. They take different attitudes, but look for the same results – testing the code in a single method or class such that invalid input is rejected and any hidden bugs are made visible. And the faster you find bugs, the easier it is to fix them.
In this particular case, I wanted to find possible NullPointerException. Dereferencing a null pointer is one of the classic mistakes that most people make in their first program. It’s still incredibly easy to make. It’s also incredibly easy to prevent. There are only so many ways to get a reference.
NPE can come from parameters:
public void foo(String pNull) {
pNull.trim();
}or internal state:
String mNull = null;
public void foo() {
mNull.trim();
}
or from an external method:
public void foo() {
getNull().trim();
}
If you check all your references, NPE cannot happen in your code. All you have to do is read through the code and think “could this be null here?”
So it made sense that if there were simple checks that could be applied, I could find or make a tool to verify that a program could not suffer from NPE.
I’ve been using PMD to find bugs in code. I ran up against the limits of PMD when I tried to write a rule that would trace variables for NPE. PMD generates an abstract syntax tree of code. Although it can recognize patterns of code, it doesn’t know that a variable referenced in one pattern is the same variable in the other pattern. I could have added that functionality, but I figured there must be tools which tracked the flow and state of variables in a method and could determine if they were null or not.
And there are. They’re called Data Flow Analysis tools. But some of them cost serious money, and some of them consume outrageous amounts of CPU time. (JTest and Excelsior FlawDetector, respectively.) So I started at looking at Design by Contract tools to see if I could at least prevent null parameter values from being passed to a method through preconditions.
I’ve looked at DBC tools repeatedly throughout the years, but they’ve always seemed too jerry-rigged to use in production systems. Things like iContract, jContractor, and JMSAssert. They always felt strangely unfinished, more like proofs of concept than actual tools. And they were old – untouched in years or more. (Jass looks like the exception, though I haven’t tried it myself.)
I found JML this time around. This is a DBC language designed for use in Java. It feels like a real, solid tool. Unlike Z, it uses a Java-like syntax, and hides the formal specification language under the covers. It supports post conditions on exceptional states, i.e. mandating that an exception must contain a non-null message. It can even define state modelling variables, so that a JavaBean property variable can be proved to be non-null after its mutator is called. It’s supported by several applications and has an active base of researchers who are finding new and interesting ways to exploit JML to say things about dynamic program state that weren’t possible before. The PDF tutorial is available here.
The suite of tools provided with JML encompasses jmlc (a compiler that adds run-time assertions to the compiled bytecode automatically), jmlunit (generates unit tests from the JML annotations), jmldoc (adds JML annotations to the javadoc, although you’d think that just specifying a doclet would be adequate), and finally escjava2 (extended static checker). Of these, the extended static checker is the most interesting.
The extended static checker (ESC), although it’s mentioned as part of JML has actually been through a long history spanning ten years. The first version was written to check Modula-3. Then there was a version to check Java, which used a limited subset of JML. Finally, there’s a second version of ESC which has been rewritten to integrate better with JML. This latest version, if it has an official name, is ESC/Java 2 [edit: mistyped this as ESC2/Java], which I’m abbreviating as ESC2. I mention this up front because many of the papers will mention ESC/Java, which is the older version written by the Compaq Research Center.
ESC2 does exactly what I want – it reads the source code of a program and points out every place that a null variable could exist. It will also test the JML pre and post conditions to see if it can prove the theorum is always true. If you call methods from a class library or JAR file, you can provide a specification file which will provide ESC2 with assurances that the external methods to the class cannot return null.
Here’s an example:
public void foo() {
getString().trim();
}
will be flagged by ESC2 automatically
Warning: Possible null dereference (Null) getString().trim();^
unless you specify
/ensures \result != null; / public String getString() { ... }
to the source code or the specification file.
The downside is that you have to add these annotations everywhere, or add null checks which make the annotations unnecessary. I think it’s safer to write the null checks anyway, because JML annotations will only work in situations where someone is using the JML toolkit. That puts more weight on the organization to use the toolkit in their build process, and if they don’t… well, your code will still die with NPE when someone else’s code passes a null parameter into your method, and the fact that it’s got some javadoc saying you shouldn’t do that is going to be no consolation when they call you at 2 in the morning.
JML opens up methods of programming that weren’t there before. With JML, I can write the assertions first, generate the unit tests from JML, and prove the assertions correct with ESC2. Without starting the program. Not only that, anyone else who interacts with my code can test their code against mine without even reading the javadoc and know that it’s covered against basic errors.
And I can tag methods I know to be buggy so that the bugs will be caught by JML. For example:
public void foo() {
StringTokenizer st = new StringTokenizer(null);
}This code won’t trigger any ESC2 warnings, but StringTokenizer has a bug which will cause NPE here. But I can tag StringTokenizer with JML to prevent this bug from ever being reached.
Because ESC2 has a common base in JML, it can be used with other programs that produce JML. Daikon is a tool which will look at a dynamically running program and try to find invariants and preconditions for a set of classes. It will output those constraints as JML, and ESC2 can then be used to check them. Here’s the paper on the result. (Note that he’s using ESC, not ESC2, so some of the limitations may no longer apply.)
There’s even an empirical evaluation of Daikon, ESC and Houdini (an annotation assistant that’s not publically available, so don’t bother trying to find it) which determines how useful these tools are in practice – the upshot seems to be that it works, but that it helps to have an annotation assistant and a fast computer when checking code. You may want to read Hacknot’s post on how studies can be skewed to the expected result before taking this too seriously.
Now for the bad news. ESC2 is still very clearly academic software, and is still in Alpha [edit: although, given the amount of JML in it, it’s probably higher quality than most other alpha software]. The [edit: installation] documentation in the binary release is unclear. There’s no Ant task. The ESC2 command line options have very little explanation. There’s a bug in resolving String invariants. Also, ESC2 doesn’t actually solve theorums itself – it generates theorem statements and passes them to another program called Simplify.
Simplify is actually written in Modula-3, is distributed as a binary executable, and is currently unmaintained. When I ran ESC2 on an ATG Dynamo project, ESC2 managed to crash Simplify twice by feeding it some hugely complex theorums.
That being said, this is one seriously kick ass tool just for the NPE detection alone. With some extra integration and some more build support, ESC2 could bring coding standards to a whole new level.
RDF as data abstraction 2
David Parnas was the first to advocate modularity and (depending on how much of a pedant you are) information hiding as techniques in software engineering. Later, Alan Kay used human cells as a model and came up with Object Oriented Programming.
Kay didn’t just invent classes. He also advocated the idea of message passing. Message passing is how data gets between one class and another class. Ideally, in Kay’s world, you don’t call a method with parameters so much as you pass a message with data to a class. Messaging provides an explicit place for state changes. It also provides a role for thinking about data and data structures as entities in their own right, aside from any implementation details.
You could argue that most software is concerned with managing data, but this is only true in the vaguest possible sense. Most software deals with business domain objects, and the use of raw data objects is not usually done between interfaces. This is a pity, because there are only really a few basic data structures of note: graphs, trees, lists, and maps. But it took decades to agree on the definitions of those structures.
One of the benefits of Java was that integers, doubles, reals and Strings were well defined. Even then, odd bits of double-point arithmetic, BigNums and signed bytes stuck out all over the place, which gives an indication of how tricksy defining the basics of data can be. The fact is that there are very few standardized way of dealing with complex data in Java. Using the Collections interface will give you half of the picture in terms of List and Maps, but the Collections API has no concept of trees and graphs, and may never do so.
Data is still a very large problem. Complex data structures are almost impossible to translate appropriately between disparate systems. The WS-I group have explicitly given up and recommended dealing with raw XML data in document literal terms rather than try to cope with different methods of access.
There has been headway. Interaction with databases has been standardized, but only to a very limited degree. Most applications will encapsulate data from a RecordSet as fast as they can, and justifiably so. Some business domain objects encompass data from a dozen different tables. Who wants to be tied to the underlying implementation? Considering the whole point of encapsulation is to prevent tight coupling to the implementation, it’s no surprise that data is abstracted away to data (or domain) objects as soon as possible.
Except in Mozilla. Mozilla’s central interface for dealing with data is RDF. RDF datasources can have any implementation they like behind them. However, they must provide the information through the nsIRDFDataSource interface that specifies a graph API of nodes and edges. Once this is done, any client can operate with the datasource without reference to the domain objects, methods, or underlying implementation. You just need to know how to manipulate a graph, and you’re set.
This is the true potential of RDF. It’s not a substitute for databases or XML. RDF is a directed labelled graph. It hides information and provides a data interface which can be used to aggregate multiple datasources that can be themselves RDF datasources. This is the purest data abstraction. Theoretically, if you weren’t worried about performance you could access an RDF datasource and not know or care whether you’re accessing a database, a web service, or both. The RDF datasource might even optimize its data structures based on what your queries or iteration patterns have been, in much the same way as hotspot compilation optimizes algorithms in the JVM.
A good question at this point is whether RDF is as powerful an API as JDBC. The answer is yes, or not quite. RDF is only a data structure, and there are several APIs for manipulating that data structure. The most well known implementation is Jena. Jena’s API for manipulating RDF graphs is very powerful, and encompasses most things you would find in JDBC, such as transactions, a query language, and prepared statements.
I’ve heard the arguments about RDF serialization, and the use of RDF in web services. I’m not very interested, because I think that the XML serialization is the least interesting part. You can take RDF and serialize it as any format you want. If you’re dealing with a well defined set of attributes, you can export it as XML and import as XML, and the data structure is just as valid as it ever was.
PSP Time Tracking 3
I read Watts Humphrey’s Introduction to the Personal Software Process a long time ago, probably when it first came out in 1996. There were some good ideas which sounded inpractical at best, and some okay ideas which didn’t seem like they deserved a book. One of the things I remember best from the PSP was the idea that the best way to fix bugs was to print everything out and go over the code as if you were a compiler, holding the state in your head. This turned out to be a great idea, and it taught me the benefits of reading code as opposed to just scanning through it.
However, for the most part the PSP is a complete bore. Forms. Spreadsheets. Metrics. Detailed time tracking. I mean, detailed to the minute time tracking. Under PSP, you’re supposed to write down what you do before you do anything.
The most horrible part was the assumption that this was somehow doable or worthwhile. Also, there was another feeling as well… the feeling of dread that under such fine scrutiny, my life might be horribly inefficient and wasteful. In some ways, PSP is the every minute Zen. Everyone would like to be an incredibly cool Zen master if they didn’t have to pay attention all the time.
Luckily, everyone else felt the same way and I never had to use it. The few people who admit to this freakish habit have to hide it from their friends and family. The only places I’ve ever seen PSP mentioned have been in CMM and IEEE presentations, and the odd thesis here and there.
The fact remains that PSP has been proven to be useful. It’s been taught to people in groups – perhaps with a lot of pain and not a little denial, but normal people can do this.
Christopher Duncan brought up the idea independently in The Career Programmer. One of the techniques he advocates is tracking time in a database. Here’s his quote:
“Yeah, yeah, I know; it seems like a hassle. And it is, in the beginning. However, you quickly get used to it, and in reality it takes less than thirty seconds to make an entry. If it takes any longer than that, you might want to brush up on your typing.”
What finally convinced me is that time tracking serves two purposes. You have a record of where your time is being spent. But you also have a record of every stupid little task that tripped you up when you were trying to get something done. If you keep a technical log, then you never have to remember anything. Every thing you do is repeatable.
Not only that, you no longer have to guess at how long a task will take you. You know, because you’ve done it before.
So I’m sold on the concept. However, getting to the implementation is different.
There are a number of time trackers out there. Some of the ones I’ve tried have been Time Shadow, TimeCore, and Ecco Pro’s internal time tracker. They all have some problem or another. TimeShadow doesn’t export data easily. TimeCore’s interface is clunky. Ecco’s internal time tracker is clearly cludged into the rest of the system. But the big problem they all have is that they don’t account for flakiness.
For a period in January, I logged my time using TimeCore and Karen’s Countdown Timer. I set the countdown timer to pop up every 30 minutes so I could find out what I was doing over the course of the day. This didn’t quite work – if I was away from the computer or was locked out, or there was a phone call… it wouldn’t pick up the change. Or I just didn’t see the thing. Whatever. The issue is not that I forgot to log my time, but that TimeCore and the other utilities made it really hard to shift my entries around and make them reflect reality… they complained about overlapping times, and then they complained about gaps.
So I remembered time tracking. But I didn’t remember PSP. Once I remembered that PSP talked about the exact same thing, I stopped looking for time trackers and started looking for PSP Automation Tools. I had far more success doing this.
Software Process Dashboard, Personal Time Manager and PSP LogControl seem to be the top standalone tools right now. Personal Time Manager looks really intimidating (36 MB?!?), but looks like it has masses of functionality. PSP LogControl tends to the other extreme… very small, but very well defined.
On the other hand, the Collaborative Software Development Laboratory has a project called Hackystat which looks just incredibly cool. It’s actually an advancement over PSP, as it validates work already done with the user’s estimate instead of forcing the user to enter in the time manually. It requires the use of software sensors embedded into your tools though, so there’s an Emacs plugin and an Eclipse plugin. But for what you get… wow. With an entire development team using this system, you could plot out completion of a system in real time.
Will try these out and see what works. Setting up a Hackystat server might be a pain, but if it means I get PSP without the pain, it’ll be worth it.
And as far as logging technical information… if it’s important, I’ll dump it into Ecco’s Inbasket and sort it out later.