CDFT: what goes in it/ source dataB
John Halleck
John.Halleck@utah.edu
Thu, 11 Jan 2001 13:28:25 -0700 (MST)
On Thu, 11 Jan 2001, Julian Todd wrote:
> [...]
> This text (in the CDATA[]), being a faithful representation of your notes
> from the cave, will not change unless there has been a transcription
> error or other blunder. After you have run your Survex parser and
> extracted the data into XML notation you could delete it and
> carry on without it. However, aside from trying to comply with
> the dogma that one must Never Represent the Same Data Twice
> Because it Might Clash, I would claim that nothing is really
> gained by throwing it away. It is in fact serving the purpose of those
> little envelopes of dried out notes you staple into your neat survey
> book. In that form it is easy to compare and check for transcription
> errors. And everyone can keep to their own quirky notational
> habits without losing anything.
> [...]
Having dealt with a large survey for many years, I agree wholeheartedly
with the underlying idea here.
Keeping a clean unmodified original text, and letting programs produce
the result of the parse, instead of trying to mark up the original line
with what it was identified with was a great help when it came to things
like proofreading back in my LBCC days. It also meant that errant programs
were more likely to add their mangled markup in the stuff they dealt with
than they were to mangle the lines of the original.
(Of course, back then the idea of storing an image of the page was totally
out of the question.)
There seems to be no end of interesting ways that people can
abuse^H^H^H^H^H^Hdesign notations. Trying to mark the actual original
lines up with what was what can be difficult at best. An untouched
original (: Marked as such, of course :), and heavily marked up generated
stuff seems the best of both worlds to me.
I'd second a vote for Image of original, transscribed original,
generated marked up stuff. (With the image being optional for us
memeory challenged folk.)