README
Date: 4, Feb 2007
Version: 0.3
aka The first public release.
SUMMARY/DESCRIPTION:
csvtoxml
is a useful command line utility that converts (parses) comma,separated,value
(csv) data, or any other tag-seperated-value data,
into
xml tagged data.
Optional
arguments include number of header lines, input file, output file, token chars
(comma is default) and xsl sheet destination.
Speed
and simplicity is the leading considerations throughout the design of this
utility; while, verbosity and descriptiveness is sacrificed and, or avoided in
output tags.
csvtoxml
is designed to fully utilize existing tried-and-true tools in string concatenation,
string replacement, file find, data retrieval from web URLs, compression and
storage.
Usage: ./csvtoxml [-i file]
[-o file] [-c chars] [-d chars] [-l header lines] [-t descriptive tags] [-x xsl
stylesheet]
Where options include:
-i set an Input
filename.
-o set an Output
filename.
-c set an array of Char
delimiters ( '\w' is space). Array treated as individual characters not a
string. Chars not present in output.
-d set an array of
characters to Delete (ignore) while parsing. Chars not be present in output.
-l set the integer
number of Lines to place into header tag.
-t set an array of
descriptive Tags. Array must be comma seperated. eg. -t day,yr,mo,bid,ask,
-x set a Xsl stylesheet
and sets a standard iso <?xml> tag.
-h show this Help.
Eg:
1. Convert a csv
input file and pipe it into output file
./csvtoxml -i
/var/datafile.csv > output.xml
2. Pipe the
contents into csvtoxml and pipe that into output file
more /var/datafile.csv |
./csvtoxml > output.xml
3. Pipe the
contents into csvtoxml and write to output file
more /var/datafile.csv |
./csvtoxml -o output.xml
4. Pipe the
contents into csvtoxml, use descriptive tags, and write to output file
more /var/datafile.csv |
./csvtoxml -o output.xml -t last,change,bid,ask
5. Pipe the
contents into csvtoxml, use descriptive tags, use comma and period as
delimiters, and write to output file,
more /var/datafile.csv |
./csvtoxml -t last,change,bid,ask -c ,. > output.xml
NOTE: 'csvtoxml -help' will provide the above help on the
commandline
FUNCTION:
This
program converts a single comma-seperated-value (CSV) stream into an xml tagged
stream using stdin or file stream as input.
The resulting xml stream
is sent to stdout, an output file may be specified.
Comma is the default
seperator between values, given its popularity in corporate data communication.
However, seperating token, and set of tokens) may be specified.
End of line, \n, is the
indicator for a new line of data, although this may be an added option in the
future.
This version offers the
ability to insert descriptive tags that are provided as an single csv argument
string.
'-t tag1,tag2' will
insert <tag1> and <tag2> around the first and second tagged value.
It will then revert to alpha tags, decribed below.
Otherwise, the xml tags
are incremented <a>, <b>, ,,, <z>. Tags are recycled at z,
additional characters are double, triple, quad, quint tags:
<aa>,<bb>, <aaa>, etc.
NOTE:
The xml tag for an empty value, in the case of consecutive delimiters ,, is a
closing xml tag, which is valid xml.
The
rudimentary & non-descriptive tags, <a>...<zzz> , may be
converted to descriptive tags using sed replace sed s/ OR use the -t option
This version has no
facility for converting tokens as xml attributes, although this function may be
added at a later date.
The
utility is designed to be used with unix pipes, and utilize the functionality
of existing tools.
It
is possible to pipe with cat, pr, wget, zip, grep, sed, awk for added
functionality.
A
useful combination is 'find' with -exec to execute csvtoxml on multiple files,
such as:
find
. -type f -name "*.dat" -exec ./csv_to_xml -s 3 -i {} -o {} \;
A
useful combination for reading csv data directly from a URL is wget piped (or
filed into) csvtoxml. The following retreives csv data from
finance.yahoo.com,
pipes into csvtoxml which then redirects the stdout into SPX_data.xml (Use -o
SPX_data.xml with Windows):
wget
http://ichart.finance.yahoo.com/table.csv?s=%5EGSPC&d=1&e=5&f=2007&g=d&a=0&b=3&c=1950&ignore=.csv
| csvtoxml -l 1 > SPX_data.xml
-l
1 will place the leading one (1) line, (Date,Open,High,Low,Close,Volume,Adj.
Close) into a header <hd></hd> tag.
The
utility reads character by character and it blocks on the char read. Thus, it
will parse a real-time stream of characters, acting a bit like a
daemon.
It will wait on the character until a final EOF character is read and then
exit. This is useful for converting an older CSV stream into the
new
XML format without major refactoring.
It
is possible to use the utility, tail -f, in order to read an ongoing real-time
stream in UNIX. A real-time file stream may also be used.
KNOWN ISSUES:
The
token to quit or end its while loop is the EOF character, which is currently
hardcoded as 'EOF' in the loop condition. This may pose a problem for
certain
OS, and you should insert the proper character for EOF.
End
of line, or '\n' is considered the token for ending a full line of data,
starting a new one and reseting the tags. This is a hardcoded character for commonality
sake, although future versions will allow alternative tokens.
Carriage
return, or '\r' is NOT considered the token for ending a full line of data, and
it is ignored in UNIX.
There
are no arguments required if you use stdin and stdout; hence the application
will block on the read() if you invoke it with no stdin and no optional
arguments.
Thus,
it may be used to parse typed or cut/pasted data on the console, allowing for
quick testing of the final output. Use an interrupt to kill the process. Will
not provide the closing tag.
$
csvtoxml
<data>
07
Jan 135.0 (RFY AG-E),1.80,pc,0,0,0,19659,07 Jan 135.0 (RFY
MG-E),8.30,pc,0,0,0,6910
<ln><a>07
Jan 135.0 (RFY
AG-E)</a><b>1.80</b><c>pc</c><d>0</d><e>0</e><f>0</f><g>19659</g><h>07
Jan 135.0 (RFY
MG-E)</h><i>8.30</i><j>pc</j><k>0</k><l>0</l><m>0</m><n>6910</n></ln>
^C
$
The
escape sequence '\w' is used to accept a white space character (' ') from the
command line, but is converted to a single white space char for character
comparison.
You
may receive the following warning if you compile:
warning:
unknown escape sequence '\w'
XML/XSL FUNCTION:
This
utility uses non-descriptive xml tags, <a></a> ...
<z></z>, yet this does not destroy the fundamental usefulness of
xml and associated technologies such as style-sheets and transformation.
The
function of tag naming is not for recognition by human users but for
recognition by computer system. Having descriptive tags, such as
</price></last> is of no additional value to a system.
The
tag and value, <a>5.0</a> is as meaningful as
<price>5.0</price> to a system.
The
following xsl code will loop through a xml file produced by csvtoxml, using the
non-descriptive tags schema.
<xsl:for-each
select="data/ln">
<tr>
<th>
<span
style="font-weight:bold;color:black">
<xsl:value-of
select="a"/>
</span>
</th><th>
<xsl:value-of
select="b"/>
</th><th>
<xsl:value-of
select="c"/>
</th><th>
<xsl:value-of
select="d"/>
</th><th>
<xsl:value-of
select="e"/>
</th><th>
<xsl:value-of
select="f"/>
</th><th>
<xsl:value-of
select="g"/>
</th><th>
<xsl:value-of
select="e - d"/>
<!-- this is basic math function in xsl -->
</th><th>
<span
style="font-weight:bold;color:black">
<xsl:value-of
select="h"/>
</span>
</th><th>
<xsl:value-of
select="i"/>
</th><th>
<xsl:value-of
select="j"/>
</th><th>
<xsl:value-of
select="k"/>
</th><th>
<xsl:value-of
select="l"/>
</th><th>
<xsl:value-of
select="m"/>
</th><th>
<xsl:value-of
select="n"/>
</th><th>
<xsl:value-of
select="o"/>
</th><th>
<xsl:value-of
select="l - k"/>
</th>
</tr>
</xsl:for-each>
INSTALLATION
Follow
these steps.
1.
Download the latest csvtoxml-*.zip (or .tar) file from sourceforge.net.
2.
Unzip (extract) the contained .cpp, README, and other associated xml and xsl
examples
3.
(OPTIONAL) Compile the source .cpp file using g++, visual c++ or c++ compiler,
if you have not downloaded a compiled binary for a certain OS
4.
Copy the binary file into your binary path, for example, /usr/local/bin
5.
Update your PATH with the location of the csvtoxml binary, if required
COMPILING (STEP 3 FROM ABOVE)
This
utility uses standard c++ libraries, iostream, fstream, istream, and sstream,
and stdlib. The one project header file is csv_to_xml.h.
Execute
this at your commandline to produce a executable entitled 'csvtoxml'
g++
./csv_to_xml.cpp -o csvtoxml
FUTURE VERSIONS
May
include
an
option for providing a string of key codes for descriptive tags.
an
option for ignoring full lines in the output xml
an
option for parsing attributes using an alternative delimiter
examples
of csv data, resulting xml and useful xsl style sheets.