| ABSTRACT |
|---|
| Object serialization, the act of writing an object onto some medium in a serial, byte-at-a-time manner is essential for communications systems and persistence of data in Object Oriented systems like Java. With the advent of JDK 1.1 we are provided with means to allow this to happen in an automatic manner. We examine ways to achieve the same effect under all versions of Java and we examine the benefits of using a parsable ASCII format to allow objects to serialize themselves in human readable/manipulable form. |
Traditional serialization is usually done in some binary manner, principally so as to minimize the amount of time spent transmitting data or the amount of space spent storing the data. These pragmatic reasons have driven the decision on how the serialization format should be structured.
This paper revolves around a format which is structured to be human readable. Such a format allows us to use the serialization mechanism for purposes for which it may not initially have been created. Examples are as a configuration format for a program or as an aid to debugging, by allowing us to examine all parts of the data object.
To understand the simplicity of the new Java offerings, lets look at what we had to do before it arrived on the scene. Traditional Object Oriented systems used their facility of bundling procedures with instances of data objects to allow us to bundle a serialization interface and a deserialization interface with each object. These interfaces are then responsible for reading & writing the instance from and to a serial stream. In general each procedure simply deals with each simple field of the data structure in a straightforward manner and for each complex data type, simply calls down to similar routines provided in that object to perform the same function. This method works for any language in fact, it is merely the case that the OO languages allow the bundling of data with routines which know about dealing with that data in a more natural manner and that in fact leads to some simplification and in fact ease of maintenance. These are in fact some of those benefits the OO pundits have been telling us about for years and they are in fact real.
So lets look at a class with some serialization code in it. For Java, we start by defining an interface for our serialization routines. This means we can declare that classes which take part in the serialization process implement this interface. This allows the single inheritance Java class model of the application to remain intact in the face of the serialization code which is often an important consideration. Our interface code looks like this:
1 import java.io.*;
2
3 public interface Serial
4 {
5 // the method that writes an object serially
6 public void writeObj(ObjOutputStream out)
7 throws IOException;
8
9 // the method that reads an object serially
10 public void readObj(ObjInputStream in)
11 throws IOException;
12 }
As we can see the interface does little more than tell the Java
compiler which methods a class implementing the interface is required
to provide.
This in turn brings us to the class itself. By way of example we have a simple class with only a few fields. The read method simply reads each fields in turn and the write method similarly writes each field in turn.
1 import java.io.*;
2
3 public class test1_0 implements Serial
4 {
5 public long aa;
6 public long bb;
7 public int cc;
8 public composite dd;
9
10 // null constructor - needed for serial to work.
11 public test1_0()
12 {
13 }
14
15 // Write our object out as a string of bytes.
16 public void writeObj(ObjOutputStream out) throws IOException
17 {
18 out.writeLong(aa);
19 out.writeLong(bb);
20 out.writeInt(cc);
21 out.writeObj(dd);
22 }
23
24 // Construct the object fields from a byte stream.
25 public void readObj(ObjInputStream in) throws IOException
26 {
27 aa = in.readLong();
28 bb = in.readLong();
29 cc = in.readInt();
30 dd = (composite)in.readObj();
31 }
32 }
Note that we have a nested object as one field and we assume it
to also implement the Serial interface. This allows us to simply
use the same routines to cause it to serialize itself as part of this
process. The need for the null constructor isn't immediately obvious
but will become clear.
Its also worth noting that what we've shown is all that is required to be done on a per class basis. As was suggested, we simply deal with each field once in the read method and once in the write method. There is of course a bit more support code required but this is provided only once and can be reused for as many objects and streams as we need. To kick off the process and to manage the stream to which we write, we provide a class which manages the serialization for us. In fact, much of the subtlety is hidden by simply subclassing the JDK's DataOutputStream class. Initializing then consists of passing in a more fundamental stream type which we pass to our superclass and providing the method which starts off the serialization process. Note that, we also inherit a number of utility methods for writing primitive types to the stream from our superclass.
1 import java.io.*;
2
3 public class ObjOutputStream extends DataOutputStream
4 {
5 // constructor
6 public ObjOutputStream(OutputStream out)
7 throws IOException
8 {
9 super(out);
10 }
11
12 // write an object to the stream
13 public synchronized void writeObj(Serial obj)
14 throws IOException
15 {
16 writeUTF(obj.getClass().getName());
17 obj.writeObj(this);
18 }
19 }
The serial format then consists of the class name written as a string
(at line 16) followed by the fields we chose to write, in the order we
chose to write them (by calling out to the writeObj() method of the
object at line 17). Note that when we encountered our nested object in
our class we chose to call out to the writeObj() method of the stream
rather than that of the object itself, meaning that the nested class
appears in the stream in an exactly congruent manner.
The choice of using the class name as a marker in the stream for the object data is an interesting one. Traditionally, we would have chosen to emit a packet type or some other token which stood in stead of the class name. RPC in fact had a registry for recording such values (and remember that in their model a class usually implied an action) and standards such as ASN.1 have complex syntaxes for deriving the names themselves. Typically, the packet type approach simply means that the object supplies a constant in some manner to the serializer which it emits.
On input then, the packet type is examined and the appropriate type of class is instantiated and told to deserialize itself from the remaining data in the stream. This usually manifests as a switch statement with one case for each of the packet types/classes we intend to serialize.
1 ptype = readInt();
2 switch (ptype) {
3 case 1:
4 obj = new Type1();
5 obj.readObj(this);
6 break;
7 ....
8 default:
9 // line noise??
10 break;
11 }
While this arrangement works satisfactorily, we can in fact do slightly
better in Java. This is because Java provides a facility where we can
instantiate a class for which we have the name at run time. In this
case, the choice of writing the class name as a string starts to make
more sense. This lets us write our top level Input handler like this:
1 import java.io.*;
2
3 public class ObjInputStream extends DataInputStream
4 {
5 // create a new ObjInputStream
6 public ObjInputStream(InputStream in)
7 throws IOException
8 {
9 super(in);
10 }
11
12 // create an object from the information in the stream.
13 public synchronized Object readObj()
14 throws IOException
15 {
16 Serial obj;
17 String className;
18
19 className = readUTF();
20 try {
21 obj = (Serial)Class.forName(className).newInstance();
22 }
23 catch (Exception e) {
24 throw new IOException(e.toString());
25 }
26 obj.readObj(this);
27
28 return obj;
29 }
30 }
And that's it. We have all the infrastructure we need to read and write
classes, provided we prepare them in a manner similar to the example
class we showed at the beginning. You'll note that this is where the need
for a constructor with no arguments occurs. The newInstance()
method being only able to trigger such constructors. The Reflection API
which arrived with JDK1.1 offers the ability to build argument lists
and thus trigger other types of cionstructor in addition to this
service which is all the earlier version of Java had available.
Fortunately, this service is all we need here.
For completeness, lets have a look at what we need to do in the calling code:
1 import java.io.*;
2
3 public class TestA
4 {
5 public static void main()
6 {
7 ObjOutputStream out = null;
8 test1_0 msg = null;
9
10 // instantiate object(s)
11 ....
12
13 // output the data to the file.
14 try {
15 out = new ObjOutputStream(
16 new FileOutputStream("ser_test"));
17 out.writeObj(msg);
18 out.flush();
19 out.close();
20 }
21 catch (Exception e) {
22 System.err.println("Exception: " + e);
23 System.exit(1);
24 }
25
26 System.exit(0);
27 }
28 }
As with most examples, the bulk of the code above is irrelevant to the
example itself. The things of importance are, creating the streams at
lines 15 and 16. You can see that for simplicity we simply write the
data to a file here. Line 17 is where all the magic happens and that's
it. The file is closed and we have various bits of housekeeping but all
the important things have happened.
As simple as the example and indeed the methodology is, there are a few observations we can make and a few conclusions we can draw.
In a manner similar to our serialization for JDK1.0, JDK1.1 provides master ObjectInputStream and ObjectOutputStream classes which operate at the top level to control the process as a whole. In fact, the code in the target class consists of nothing more than adding the implements clause to the class header and the client code looks remarkably similar to that we wrote above. We have in fact contrived through careful choice of names to make it no more than a simple substitution of Object for Obj in the example.
So how does the JDK1.1 implementation compare in terms of the issues we mentioned above? Lets revisit them one at a time.
The next step along would be for classes to regress and provide readObject() and writeObject() methods which implement whatever policy is more appropriate than the default. The default action is provided as hook so that those cases which simply wish to provide pre or post processing can do so easily.
Finally a second interface named externalizable is provided for classes which want to provide their own serializing routines and want complete control of the output data format.
Once again though, its a binary format. Unlike our JDK1.0 methodology above, its harder to bend this one to the needs of an ASCII format. Happily though, JDK1.1 also offers us new API called reflection which allows objects to examine the structure of underlying classes at run time. This is principally provided as a mechanism to implement the Java Beans component API but is available for other uses.
Parts of the JDK1.1 ObjectStreams are built upon the foundation which reflection offers. This means we can use reflection ourselves to perform similar functions.
Like the other formats we will start by defining an interface. In common with the JDK1.1 approach, this is only required by those classes which want to offer overriding methods in place of the default actions of these classes.
1 /*
2 * The interface for objects that can be serialized in ASCII format.
3 */
4 import java.io.*;
5
6 public interface AsciiObject
7 {
8 /**
9 * the method that writes an object in ASCII form
10 */
11 public void writeAsciiObject(AsciiObjectOutputStream out)
12 throws IOException;
13
14 /**
15 * the method that reads an object back from ASCII form
16 * fields.
17 */
18 public Object readAsciiObject(AsciiObjectInputStream in)
19 throws IOException;
20 }
21
Unlike JDK1.1 we have not provided an opt-out
mechanism, but the Serializable interface itself could be used if we
wish and an if (obj instanceof Serializable) test added to code
as appropriate.
We'll start by looking at the output code as its simpler and because the skeleton it uses is the same as that of the input code (as we take the same steps, perform the compatible inverse I/O's in the same order) in order to read the same object which was written.
We start by noting that the AsciiObjectOutputStream is subclassed from the JDK1.1 PrintWriter class. This gives it access to a print and a println method which operates on the stream it is passed. This is the stream we pass into the constructor. Additionally the constructor also sets a small amount of local state, notably the amount of whitespace we indent the output by. The whitespace is principally for humans who may examine the stream and is incremented before and decremented after nested objects.
1 /*
2 * A stream that allows us to write serialized objects as parsable ASCII.
3 */
4 import java.io.*;
5 import java.util.*;
6 import java.lang.reflect.*;
7
8 /**
9 * A stream that allows us to write serialized objects.
10 */
11 public class AsciiObjectOutputStream extends PrintWriter
12 {
13 // do we want extra debug output? 0 is off, >0 increases verbosity.
14 final static int debug = 0;
15
16 // local storage for this class.
17 protected int indent; // cosmetic whitespace to add
18 private Object instance;
19
20 /**
21 * create an AsciiObjectOutputStream
22 *
23 * @param out the output stream we are operating on.
24 */
25 public AsciiObjectOutputStream(
26 OutputStream out
27 ) throws IOException
28 {
29 // Hand the stream to PrintWriter to deal with.
30 super(out);
31
32 // We have a bit of state ourselves which we want handled.
33 this.indent = 0;
34 this.instance = this;
35 }
36
This routine is the top level of the serialization proper. Principally
it writes a class header and arranges to emit all of the fields of the
class object in question. Its worth noting that the fields in question
must largely be public or at least accessible at the top level to be
serialized. We'll see why in the next section of text.
37 //------------------------------------------------------------------
38 /**
39 * write an object to the stream
40 *
41 * @param obj the object to be serialized.
42 */
43 public synchronized void writeAsciiObject(
44 Object obj
45 ) throws IOException
46 {
47 Class cl = null;
48 int mod;
49
50 try {
51 // If we got nothing to write then we have choice of
52 // actions. For now we will simply do nothing. Long
53 // term we may wish to do something to ensure we get
54 // back a null pointer when we read.
55 if (obj == null) {
56 if (debug > 0)
57 System.err.println("tried writing NULL object");
58 // throw new IOException("tried writing NULL object");
59 return;
60 }
61
62 if (debug > 0)
63 System.err.println("writeAsciiObject("+obj+")");
64
65 // We're trying to be flexible here. There's a limit
66 // though to what makes sense. Mostly obj should
67 // be an instance of something.
68 if (obj instanceof Class)
69 cl = (Class)obj;
70 else
71 cl = obj.getClass();
72
73 // We allow a class to provide its own encoding and
74 // decoding routines. Note that the AsciiObject
75 // interface isn't mandatory for our objects but it
76 // defines the writeAsciiObject() and
77 // readAsciiObject() methods for us to use here.
78 if (obj instanceof AsciiObject) {
79 ((AsciiObject)obj).writeAsciiObject(
80 (AsciiObjectOutputStream)instance);
81 return;
82 }
83
84 // start writing the class itself. If we're calling
85 // ourselves the class name has already been emitted.
86 if (indent < 1)
87 println("class " +
88 obj.getClass().getName() +
89 " = {");
90
91 // recurse up the chain of superclasses and emit
92 // their fields...
93 write_superclass(obj, cl.getSuperclass());
94
95 // ... and iterate along the fields declared here.
96 write_fields(obj, obj);
97
98 // end marker.
99 if (indent < 1)
100 println("}\n");
101 }
102 catch (Exception e) {
103 // throw new IOException("error: "+ e);
104 System.err.println("error: "+ e);
105 e.printStackTrace(System.err);
106 }
107 }
108
This brings us to the first of the utility routines. In order to emit
all the fields of a class we can either use the getFields() method of the
reflection API or the getDeclaredFields() method. The principle
difference between these two methods is that the getDeclaredFields()
method provides a list of all the fields declared in a single class,
whereas getFields() returns a list of all the accessible
fields of a class and all its superclasses. Essentially then,
getFields() is aware of the Java scope and visibility rules and applies
the access restrictions to winnow the list of fields. In contrast,
getDeclaredFields() tells us about all the fields but only for the
immediate class. Superclasses must be dealt with explicitly. That is
the purpose of this utility routine. To traverse the superclass chain
and iterate along the fields of each superclass.
The reason for choosing to deal with all the fields of a class rather than just the accessible ones is that this way we expose the fact that we cannot serialize all the fields of a class. The JDK1.1 serialization doesn't suffer from this limitation by using a native (that is machine code) method which bypasses such access restrictions. If we were to simply process the list of accessible fields, we would silently fail to process such fields. This is also the reason why we require all the fields of classes we wish to serialize this way to be accessible, i.e. usually public. For the purposes of debugging and configuration we are ostensibly creating this facility for, this is not unreasonable. There may be circumstances where it does not fit the model though.
109 //------------------------------------------------------------------
110 /**
111 * write an object's superclass fields to the stream. Note to output
112 * fields in order we traverse the superclasses depth first. At
113 * each superclass we emit the locally declared fields.
114 *
115 * @param obj the object to be serialized.
116 * @param cl the (super)class type we are dealing with.
117 */
118 void write_superclass(
119 Object obj,
120 Class cl
121 ) throws Exception
122 {
123 if (cl != null) {
124 write_superclass(obj, cl.getSuperclass());
125 write_fields(obj, cl);
126 }
127 }
128
This is the workhorse routine. Principally, it consists of two
structures. The first is a loop which iterates along the fields of the
class. The second is a series of sequential if statements which deal
with the sorts of fields we may find.
Firstly we deal with the primitive data types supported by Java. These are the low level types and must be handled specially. The next special handling is for array types. This is forced upon us by the way the reflection API deals with such data. As each array is itself composed of one of the other underlying types we must then handle that underlying type for each element of the array. For simplicity we only handle single dimensional arrays for the moment. Lastly, we handle objects which are themselves classes. These are treated as Strings, for which we have a special encoding, and other classes for we recursively call ourselves.
129 //------------------------------------------------------------------
130 /**
131 * write an object's fields to the stream. The fields we will attempt
132 * to write are all the fields declared in this class. This breaks
133 * badly if we encounter fields which are not public in particular
134 * as we will get IllegalAccesException when we try to fill them
135 * in. C'est La Vie. JDK1.1 gets around this by having a native
136 * method bypass all that inconvenient checking for it.
137 *
138 * @param obj the object to be serialized.
139 * @param classobj the class type we are dealing with.
140 */
141 void write_fields(
142 Object obj,
143 Object classobj
144 ) throws Exception {
145 Class cl;
146 Field fields[];
147 StringBuffer indentbuf = null;
148 String type = null;
149
150 // To make the our output more human readable we prepend
151 // whitespace in the usual manner to show nesting of
152 // objects and their data.
153 indent++;
154 indentbuf = new StringBuffer();
155 for (int i=0 ; i<indent ; i++)
156 indentbuf.append("\t");
157
158 // once again we attempt to flexible in what we will deal with.
159 if (classobj instanceof Class)
160 cl = (Class)classobj;
161 else
162 cl = classobj.getClass();
163
164 // detect when we've reached out limits. This is particularly
165 // the case when we're traversing the superclass chain.
166 if ((cl == null) || (cl.isInstance(new Object()))) {
167 indent--;
168 return;
169 }
170
171 // we emit the class we're dealing with as a useful diagnostic
172 println(indentbuf + "// " + classobj);
173
174 // Now we process each field in turn.
175 fields = cl.getDeclaredFields();
176 for (int i=0 ; i<fields.length ; i++) {
177 Class ctype = fields[i].getType();
178 int mod;
179
180 mod = fields[i].getModifiers();
181 if (Modifier.isStatic(mod))
182 continue;
183
184 // lowest level debug prints each field as we see it.
185 if (debug > 0)
186 System.err.println(
187 "field " + i + ": " + fields[i]);
188
189 print(indentbuf +
190 getTypeName(ctype) + " " +
191 fields[i].getName() +
192 (ctype.isArray() ?
193 "[" + Array.getLength(fields[i].get(obj)) + "]" :
194 "") +
195 " =");
196
197 if (ctype.isPrimitive()) {
198 Object curr_obj = null;
199
200 if (
201 ctype.equals(Boolean.TYPE) ||
202 ctype.equals(Character.TYPE) ||
203 ctype.equals(Byte.TYPE) ||
204 ctype.equals(Short.TYPE) ||
205 ctype.equals(Integer.TYPE) ||
206 ctype.equals(Long.TYPE) ||
207 ctype.equals(Float.TYPE) ||
208 ctype.equals(Double.TYPE) ||
209 ctype.equals(Void.TYPE)
210 ) {
211 handle_instance_field(
212 fields[i].get(obj));
213 } else {
214 throw new IOException("unknown primitive type");
215 }
216 } else if (ctype.isArray()) {
217 int j, max;
218
219 max = Array.getLength(fields[i].get(obj));
220 println(" {");
221 for (j=0 ; j<max ; j++) {
222 handle_array_field(
223 j,
224 Array.get(fields[i].get(obj),j),
225 ctype);
226 }
227 println(indentbuf + "}");
228 } else {
229 if (ctype.isInstance(new String())) {
230 handle_instance_field(
231 fields[i].get(obj));
232 } else {
233 Object nval = fields[i].get(obj);
234
235 if (nval == null)
236 println(" null");
237 else {
238 println(" {");
239 handle_instance_field(nval);
240 println(indentbuf + "}");
241 }
242 }
243 }
244 }
245
246 indent--;
247 }
248
249
Next we have two routines which deal with a single field and a single
element of an array respectively. The array handler simply extracts the
array element and then hands it off to the single field handler.
250 //------------------------------------------------------------------
251 void handle_instance_field(
252 Object obj
253 ) throws Exception
254 {
255 // print("[[[ " + obj + " ]]] ");
256
257 if (obj == null)
258 println(" null");
259
260 else if (
261 (obj instanceof Boolean) ||
262 (obj instanceof Character) ||
263 (obj instanceof Byte) ||
264 (obj instanceof Short) ||
265 (obj instanceof Integer) ||
266 (obj instanceof Long) ||
267 (obj instanceof Float) ||
268 (obj instanceof Double)
269 ) {
270 println(" " + obj);
271
272 } else if (obj instanceof String) {
273 print(" ");
274 writeString((String)obj);
275 println("");
276
277 } else {
278 writeAsciiObject(obj);
279 }
280 }
281
282
283 //------------------------------------------------------------------
284 void handle_array_field(
285 int j,
286 Object obj,
287 Class ctype
288 ) throws Exception
289 {
290 StringBuffer newindent = new StringBuffer();
291
292 for (int i=0 ; i<indent+1 ; i++)
293 newindent.append("\t");
294
295 if (obj == null)
296 println(newindent +
297 getTypeName(ctype) + " " +
298 j + " = null");
299 else if (
300 (obj instanceof Boolean) ||
301 (obj instanceof Character) ||
302 (obj instanceof Byte) ||
303 (obj instanceof Short) ||
304 (obj instanceof Integer) ||
305 (obj instanceof Long) ||
306 (obj instanceof Float) ||
307 (obj instanceof Double) ||
308 (obj instanceof String)
309 ) {
310 print(newindent +
311 getTypeName(ctype) + " " +
312 j + " =");
313 handle_instance_field(obj);
314 } else {
315 println(newindent +
316 getTypeName(ctype) + " " +
317 j + " = {");
318 // print(newindent + "\t");
319 indent++;
320 handle_instance_field(obj);
321 indent--;
322 println(newindent + "}");
323 }
324 }
325
326
Finally, we come to two routines which complete the handling of fields.
One is the routine which extracts the type name of the field. Once
again arrays require special handling. Lastly, there is a routine which
encapsulates the special handling required for Strings.
327 //------------------------------------------------------------------
328 String getTypeName(
329 Class ctype
330 ) throws IOException
331 {
332 String type = null;
333
334 if (ctype.isPrimitive()) {
335 type = ctype.getName();
336 } else if (ctype.isArray()) {
337 int depth = 0;
338 Class atype = ctype;
339
340 while(atype.isArray()) {
341 depth++;
342 atype = atype.getComponentType();
343 type = getTypeName(atype);
344 }
345 if (depth > 1)
346 throw new IOException(
347 "more than 1 array dimension found [" +
348 depth + "]");
349 } else {
350 type = (ctype.isInstance(new String())) ?
351 "String" :
352 "class " + ctype.getName();
353 }
354
355 return type;
356 }
357
358 //------------------------------------------------------------------
359 /**
360 * write a String to the stream.
361 *
362 * @param s the string to be serialized.
363 */
364 void writeString(
365 String s)
366 throws IOException
367 {
368 if (s == null) {
369 print("null");
370 } else if (s.length() == 0) {
371 print("\"\"");
372 } else {
373 print("\"" + s + "\"");
374 }
375 }
376
377 }
378
Thats all that we require to serialize our objects. Next we look at the
flip side of the process. Most of this is simply running the plumbing
backwards inside the same framework. What differences exist are
completely due to the different processes. Parsing an input stream
presents some different conceptual and practical problems to writing an
output stream. Many of those differences are encapsulated in the
parsing support code at the end of the class.
1 /*
2 * A stream that allows us to read objects serialized as parsable ASCII.
3 */
4 import java.io.*;
5 import java.lang.reflect.*;
6 import java.util.*;
7
8 /**
9 * a stream that allows us to read serialized objects.
10 */
11 public class AsciiObjectInputStream extends PushbackReader
12 {
13 // the stream which is this one (mostly for handing to others).
14 private Object instance; // handle to out stream object
15
16 protected int level; // determine recursion depth.
17
18 // the size of the biggest token we might push back.
19 private final static int MAXTOKEN = 5;
20
21 //------------------------------------------------------------------
22 /**
23 * create a new AsciiObjectInputStream
24 */
25 public AsciiObjectInputStream(
26 FileInputStream in
27 ) throws IOException
28 {
29 // Hand the stream to PushbackReader to deal with.
30 super(new FileReader(in.getFD()), MAXTOKEN );
31
32 // We have a bit of state ourselves which we want handled.
33 this.instance = this;
34 this.level = 0;
35 }
36
37 //------------------------------------------------------------------
38 /**
39 * create an object from the information in the stream.
40 */
41 public synchronized Object readAsciiObject()
42 throws IOException
43 {
44 Object obj;
45 String classToken = null;
46 String className = null;
47 String brace = null;
48 boolean scan;
49
50 skipWhite(); // skip leading white space.
51
52 // read till we see a "class" token...
53 scan = true;
54 do {
55 classToken = readToken();
56
57 if (isNull(classToken))
58 return null; // null class
59 if (classToken.equals("class"))
60 scan = false;
61 } while (scan);
62
63 className = readToken();
64 if (isNull(className))
65 return null; // malformed.
66
67 if (level > 0)
68 readToken(); // skip varname.
69 readToken(); // skip '='
70
71 brace = readToken();
72 if (isNull(brace)) // when we're called recursively
73 return null; // a field may be a null Object.
74
75 try {
76 Class cl;
77
78 // build an instance of our deserialized object.
79 obj = Class.forName(className).newInstance();
80
81 // Get a handle to the class of the object.
82 cl = (obj instanceof Class) ?
83 (Class)obj :
84 obj.getClass();
85
86 // We allow a class to provide its own encoding and
87 // decoding routines. Note that the AsciiObject
88 // interface isn't mandatory for our objects but it
89 // defines the writeAsciiObject() and
90 // readAsciiObject() methods for us to use here.
91 if (obj instanceof AsciiObject) {
92 return ((AsciiObject)obj).readAsciiObject(
93 (AsciiObjectInputStream)instance );
94 }
95
96 // recurse up the chain of superclasses and process
97 // those fields...
98 read_superclass(obj, cl.getSuperclass());
99
100 // ... and iterate along the fields declared here.
101 read_fields(obj, obj);
102
103 readToken(); // skip '}'
104 }
105 catch (Exception e) {
106 throw new IOException(e.toString());
107 }
108
109 return obj;
110 }
111
112
113 //------------------------------------------------------------------
114 void read_superclass(
115 Object obj,
116 Class cl
117 ) throws Exception
118 {
119 if (cl != null) {
120 read_superclass(obj, cl.getSuperclass());
121 read_fields(obj, cl);
122 }
123 }
124
125
126 //------------------------------------------------------------------
127 void read_fields(
128 Object obj,
129 Object classobj
130 ) throws Exception
131 {
132 Class cl;
133 Field fields[];
134
135 level++;
136
137 // Get a handle to the class of the object.
138 cl = (classobj instanceof Class) ?
139 (Class)classobj :
140 classobj.getClass();
141
142 // detect when we've reached out limits. This is particularly
143 // the case when we're traversing the superclass chain.
144 if ((cl == null) || (cl.isInstance(new Object()))) {
145 level--;
146 return;
147 }
148
149 // comment is silently skipped so nothing to do here.
150
151 // process each field in turn.
152 fields = cl.getDeclaredFields();
153 for (int i=0 ; i<fields.length ; i++) {
154 Class ctype = fields[i].getType();
155 int mod;
156 String typeName = null,
157 varName = null;
158
159 mod = fields[i].getModifiers();
160 if (Modifier.isStatic(mod))
161 continue;
162
163 // primitive types are handled directly.
164 if (ctype.isPrimitive()) {
165 typeName = readToken();
166 varName = readToken();
167 readToken(); // skip "="
168
169 if (
170 ctype.equals(Boolean.TYPE) ||
171 ctype.equals(Character.TYPE) ||
172 ctype.equals(Byte.TYPE) ||
173 ctype.equals(Short.TYPE) ||
174 ctype.equals(Integer.TYPE) ||
175 ctype.equals(Long.TYPE) ||
176 ctype.equals(Float.TYPE) ||
177 ctype.equals(Double.TYPE)
178 )
179 handle_instance_field( typeName,
180 varName,
181 obj,
182 fields[i],
183 readToken() );
184 else if (ctype.equals(Void.TYPE))
185 throw new IOException("read_fields: VOID type field found!");
186 else
187 throw new IOException("read_fields: unknown primitive type");
188
189 // for arrays we need to extract the underlying type 1st
190 } else if (ctype.isArray()) {
191 StringTokenizer st = null;
192 int dim;
193 Object nval = null;
194
195 typeName = readToken();
196 if (typeName.equals("class"))
197 typeName = readToken();
198 varName = readToken();
199 readToken(); // skip "="
200 readToken(); // skip curly brace
201
202 st = new StringTokenizer(varName, "[]");
203 st.nextToken(); // skip var name
204
205 dim = Integer.parseInt(st.nextToken());
206
207 // no support for multi-dim arrays yet
208 if (st.hasMoreTokens())
209 throw new IOException("multi-dimensional array found - only one dimensional arrays supported");
210
211 // if the constructor didn't make one for us...
212 if (fields[i].get(obj) == null) {
213 fields[i].set(obj, Array.newInstance(
214 ctype.getComponentType(),
215 dim));
216 }
217
218 // pull in each element of the array
219 for (int j=0 ; j<dim ; j++) {
220 handle_array_field( typeName,
221 varName,
222 obj,
223 fields[i],
224 j);
225 }
226 readToken(); // skip curly brace
227 } else {
228
229 // Strings need special care
230 if (ctype.isInstance(new String())) {
231 typeName = readToken();
232 varName = readToken();
233 readToken();
234
235 handle_instance_field( typeName,
236 varName,
237 obj,
238 fields[i],
239 readString() );
240
241 // recurse as everything else is another class
242 } else {
243 Object nval = readAsciiObject();
244
245 handle_instance_field( typeName,
246 varName,
247 obj,
248 fields[i],
249 nval );
250 }
251 }
252 }
253
254 level--;
255 }
256
257
258 //------------------------------------------------------------------
259 // pull in a single field which isn't an array or a non-string class
260 void handle_instance_field(
261 String tname,
262 String fname,
263 Object obj,
264 Field fl,
265 Object value
266 ) throws IOException
267 {
268 String svalue = null;
269
270 // Some sanity and 'do nothing' tests
271 if (value == null) return;
272 if (value == null) return;
273
274 // Convenience to save lots of casts later.
275 if (value instanceof String) {
276 svalue = (String)value;
277
278 if (svalue.equals("null"))
279 return;
280 }
281
282 // now try to run the assignments
283 try {
284 if (fl.getType().equals(Boolean.TYPE))
285 fl.set(obj, new Boolean(svalue));
286 else if (fl.getType().equals(Character.TYPE)) {
287 char[] onechar = new char[1];
288 svalue.getChars(0, 1, onechar, 0);
289 fl.set(obj, new Character(onechar[0]));
290 } else if (fl.getType().equals(Byte.TYPE))
291 fl.set(obj, new Byte(svalue));
292 else if (fl.getType().equals(Short.TYPE))
293 fl.set(obj, new Short(svalue));
294 else if (fl.getType().equals(Integer.TYPE))
295 fl.set(obj, new Integer(svalue));
296 else if (fl.getType().equals(Long.TYPE))
297 fl.set(obj, new Long(svalue));
298 else if (fl.getType().equals(Float.TYPE))
299 fl.set(obj, new Float(svalue));
300 else if (fl.getType().equals(Double.TYPE))
301 fl.set(obj, new Double(svalue));
302 else if (fl.getType().equals((new String("").getClass())))
303 fl.set(obj, value);
304 else
305 fl.set(obj, value);
306 }
307 catch (Exception e) {
308 System.err.println(
309 "ERROR: assigning to " +fl +"\n" +
310 "\tread: " +tname +" " +fname +" = " +value);
311 throw new IOException("field assignment failure:" + e);
312 }
313 }
314
315
316 //------------------------------------------------------------------
317 // pull in a single array element. This is similar to the above but
318 // we need to tell the reflect API which array element we're
319 // dealing with.
320 void handle_array_field(
321 String tname,
322 String vname,
323 Object obj,
324 Field fl,
325 int idx
326 ) throws IOException
327 {
328 Class ctype = fl.getType();
329 String type = null,
330 var = null,
331 value = null;
332
333 while (ctype.isArray()) // should only be 1 dim
334 ctype = ctype.getComponentType();
335
336 if (
337 ctype.equals(Boolean.TYPE) ||
338 ctype.equals(Character.TYPE) ||
339 ctype.equals(Byte.TYPE) ||
340 ctype.equals(Short.TYPE) ||
341 ctype.equals(Integer.TYPE) ||
342 ctype.equals(Long.TYPE) ||
343 ctype.equals(Float.TYPE) ||
344 ctype.equals(Double.TYPE)
345 ) {
346 type = readToken();
347 var = readToken();
348 readToken(); // skip '='
349 value = readToken();
350 } else if ( ctype.equals((new String("").getClass())) ) {
351 type = readToken();
352 var = readToken();
353 readToken(); // skip '='
354 value = readString();
355 }
356
357 try {
358 if ( ctype.equals(Boolean.TYPE) )
359 Array.set(fl.get(obj), idx, new Boolean(value));
360 else if ( ctype.equals(Character.TYPE) ) {
361 char[] onechar = new char[1];
362 value.getChars(0, 1, onechar, 0);
363 Array.set(fl.get(obj), idx, new Character(onechar[0]));
364 } else if ( ctype.equals(Byte.TYPE) )
365 Array.set(fl.get(obj), idx, new Byte(value));
366 else if ( ctype.equals(Short.TYPE) )
367 Array.set(fl.get(obj), idx, new Short(value));
368 else if ( ctype.equals(Integer.TYPE) )
369 Array.set(fl.get(obj), idx, new Integer(value));
370 else if ( ctype.equals(Long.TYPE) )
371 Array.set(fl.get(obj), idx, new Long(value));
372 else if ( ctype.equals(Float.TYPE) )
373 Array.set(fl.get(obj), idx, new Float(value));
374 else if ( ctype.equals(Double.TYPE) )
375 Array.set(fl.get(obj), idx, new Double(value));
376 else if ( ctype.equals((new String("").getClass())) )
377 Array.set(fl.get(obj), idx, value);
378 else {
379 Object nobj = readAsciiObject();
380
381 value = nobj == null ?
382 null :
383 nobj.toString(); // for diag.
384 Array.set(fl.get(obj), idx, nobj);
385 }
386 }
387 catch (Exception e) {
388 System.err.println(
389 "ERROR: assigning to " +fl +"[" +idx +"]\n" +
390 "\tread: " +tname +" " +vname +" = " +value);
391 e.printStackTrace(System.err);
392
393 throw new IOException("field assignment failure:" + e);
394 }
395 }
396
397
Not a lot to say here that wasn't said above and isn't said in the
comment below. Over and above that, there isn't anything really complex
here. These parsing routines are fairly plain vanilla and don't do
anything special or tricky.
398 //------------------------------------------------------------------
399 // UTILITY routines....
400 //
401 // form here on we have some utility routines which just package up
402 // the stuff we made use of earlier. That is input methods, and
403 // tests for what we're looking at.
404
405
406 //------------------------------------------------------------------
407 /**
408 * create a String from the information in the stream.
409 */
410 String readString()
411 throws IOException
412 {
413 String token = null;
414 StringBuffer sb = new StringBuffer();
415 boolean quoted = false;
416 int ch;
417
418 ch = read();
419 if (ch == '"') {
420 // sb.append((char)ch); // leading '"'?
421
422 do {
423 ch = read();
424
425 if (quoted) {
426 switch (ch) {
427 case '"':
428 case '\\':
429 sb.append((char)ch);
430 break;
431 default:
432 sb.append('\\');
433 sb.append((char)ch);
434 break;
435 }
436 quoted = false;
437 } else {
438 if (ch == '\\')
439 quoted = true;
440 else if (ch != '"')
441 sb.append((char)ch);
442 }
443 } while ((ch != '"') || quoted);
444 skipWhite();
445
446 return sb.toString();
447
448 } else if (ch == 'n') {
449 unread(ch);
450 token = readToken();
451 if (token.equals("null")) {
452 skipWhite();
453 return null;
454 }
455 }
456
457 throw new IOException("badly formatted string");
458 }
459
460
461 //------------------------------------------------------------------
462 /**
463 * return the next whitespace delimited token from the information
464 * in the stream.
465 */
466 String readToken()
467 throws IOException
468 {
469 int ch;
470 StringBuffer sb = new StringBuffer();
471
472 ch = read();
473 if (!isSpecial(ch)) {
474 sb.append((char)ch);
475
476 do {
477 ch = read();
478 if (!isWhite(ch))
479 sb.append((char)ch);
480 } while (!isWhite(ch));
481 }
482
483 // skip trailing whitespace.
484 skipWhite();
485
486 return sb.toString();
487 }
488
489
490 //------------------------------------------------------------------
491 /**
492 * Skip over one or more delimiter characters.
493 */
494 void skipWhite()
495 throws IOException
496 {
497 int ch;
498 int lastch = 0;
499
500 do {
501 ch = read();
502 if ((ch=='/') && (lastch=='/')) {
503 skipLine();
504 ch = ' ';
505 }
506 lastch = ch;
507 } while (isWhite(ch));
508
509 unread(ch);
510 }
511
512 //------------------------------------------------------------------
513 /*
514 * Skip the rest of the line. Mainly used when we're processing C++
515 * double slash comments in the input stream.
516 */
517 void skipLine()
518 throws IOException
519 {
520 int ch;
521
522 while ((ch = read()) != '\n');
523 }
524
525
526 //------------------------------------------------------------------
527 /**
528 * A couple of quick tests for what we got from the stream. Hardwired
529 * as there's no good reason to allow them to be configurable.
530 */
531 boolean isSpecial(
532 int ch
533 ) {
534 return (ch == '"');
535 }
536
537 boolean isWhite(
538 int ch
539 ) {
540 boolean rv = (
541 (ch == '/') ||
542 (ch == ' ') ||
543 (ch == '\t') ||
544 (ch == '\r') ||
545 (ch == '\n')
546 );
547
548 return rv;
549 }
550
551 boolean isNull(
552 String s
553 ) {
554 if (s == null)
555 return true;
556 if (s.equals("null"))
557 return true;
558
559 return false;
560 }
561 }
562
So having seen the bulk of the code, lets have a look at how we use it.
This example is similar to the one we had at the start of the paper. It
differs only in that it reads its output back in. It then re-writes
the newly read classes a second time to a new file. This allows us to
use the UNIX diff utility to compare the files. Any differences
imply bugs in the process.
1 import java.io.*;
2 import java.net.*;
3
4 public class TestAsc
5 {
6 public static void main(String args[])
7 {
8 Timestamp msg = null;
9 Object obj = null;
10
11 msg = new Timestamp("startup", 9, 30, 42, "", null);
12
13 try {
14 AsciiObjectOutputStream out = null;
15
16 out = new AsciiObjectOutputStream(
17 new FileOutputStream("ser_test"));
18 out.writeAsciiObject(msg);
19 out.close();
20
21 // now try reading it back in.
22 try {
23 AsciiObjectInputStream in = null;
24
25 in = new AsciiObjectInputStream(
26 new FileInputStream("ser_test"));
27 obj = in.readAsciiObject();
28 in.close();
29 }
30 catch (Exception e) { /*ignore */ }
31
32 // now write it back out for completeness
33 out = new AsciiObjectOutputStream(
34 new FileOutputStream("ser_vrfy"));
35 out.writeAsciiObject(obj);
36 out.close();
37 }
38 catch (Exception e) {
39 System.err.println("Exception: " + e);
40 System.exit(1);
41 }
42
43 System.exit(0);
44 }
45 }
Finally lets have a look at the output. Remember there's nothing too
special about the format. It's only meant to be human readable and
unambiguous enough to read back in. This contrasts nicely with the
toString() diagnostics in the comment lines at the head of each class.
1 class Timestamp = {
2 // Timestamp(startup..9:30:42..:null)
3 byte bfl = 0
4 int ifl = 333
5 long lfl = 0
6 float ffl[2] = {
7 float 0 = 0.0
8 float 1 = 0.0
9 }
10 double dfl = 0.0
11 String event = "startup"
12 short hours = 9
13 short minutes = 30
14 short seconds = 42
15 String str1 = ""
16 String str2 = null
17 class ErrString err[5] = {
18 class ErrString 0 = {
19 // ErrString(startup:0)
20 String event = "startup"
21 int errcode = 0
22 }
23 class ErrString 1 = null
24 class ErrString 2 = {
25 // ErrString(startup:1)
26 String event = "startup"
27 int errcode = 1
28 }
29 class ErrString 3 = null
30 class ErrString 4 = null
31 }
32 }
So where does that leave us? What we have so far is far from perfect.
It does allow us to achieve the aims we set ourselves and by this
yardstick I am happy to declare ti a success. We can use this
code to output an arbitrary class and usually a complex groups of
classes for debugging purposes. With care,
humans can edit together configuration information which can be parsed
by this code. Indeed, the output is about as good as we can expect it
to get. It is certainly sufficient for our purposes.
Clearly, the greatest shortcoming of the current code is that it is finnicky about its input. That is, the input code expects the right fields in the right order with compatible data. That's not really flexible enough to make most humans happy for a long time. The next version of this code can be expected to build keyed tables of the input and search that table/those tables for the input required. Missing input would then simply imply acceptance of default values from the default constructor. This would expand the role the null constructor from that of merely constructing an object to constructing one with known defaults. Humans coding inut manually (i.e. editing a configuration file for instance) would then be ina position to accept those defaults or override them.
The ability to elide uneccessary information from the stream, when authored by a humans at least, is the principle motivation for writing field names in such a format at all. While they may be used for checking (notably absent in this implementation) they are more useful as a lookup key for the assignment of data to the fields it belongs in.
The other notable shortcoming of this code is the inability to handle multi-dimensional arrays, a simple matter of recoding some of the handling and some exposure to the problems we earlier alluded to about serializing object graphs with cycles and loops in them. Once again, these facilities were not seen as central to the felxible configuration format we were principally seeking. Adding some support to correctly serialize and recover such objects is not onerous, merely another complicating factor we did need specify at this point in time.
It is also interesting to note that we have written enough data that with access to the classes which compose the Java compiler (a Java application itself in most environments) we could compile the class as we receive the data and only then hand the stream to the newly compiled class to deserialize. What processing can then sensibly be undertaken is another matter, but the objects can be recovered from ASCII streams as in this case at least, the stream is essentially self describing.
That's about it for now. More work will be happening on this as if nothing else, we have a use for this at work and hence have an interest in pursuing it. The changes we forsee are the relaxation of input requirements discussed above and perhaps a formalization of the grammar of the output stream, which till now, has developed in an ad hoc manner and bears the scars of this process. That's about all I have planned at this stage and that in turn reflects the basic fact that this is a good start.
The code shown here should be available and with a little luck later versions may also be available. If you can't find them, and believe they are or should be, drop me a note. If you've found this useful or feel there are facilities missing which woudl enhance the code, I'd be interested in hearing about it too. Other than that, Good Luck!
| [1] |
Java in a Nutshell, 2ed.
David Flanagan, O'Reilly & Associates, Inc. ISBN 1-56592-262-X |
| [2] |
Java Reflection API documentation.
http://java.sun.com/products/jdk/1.1/docs/guide/reflection/index.html Sun Microsystems Inc. |