README

 

Date: 4, Feb 2007

Version: 0.3

aka The first public release.

 

SUMMARY/DESCRIPTION:

 

                        csvtoxml is a useful command line utility that converts (parses) comma,separated,value (csv) data, or any other tag-seperated-value data,

                        into xml tagged data.

 

                        Optional arguments include number of header lines, input file, output file, token chars (comma is default) and xsl sheet destination.

 

                        Speed and simplicity is the leading considerations throughout the design of this utility; while, verbosity and descriptiveness is sacrificed and, or avoided in output tags.

                        csvtoxml is designed to fully utilize existing tried-and-true tools in string concatenation, string replacement, file find, data retrieval from web URLs, compression and storage.

 

Usage: ./csvtoxml [-i file] [-o file] [-c chars] [-d chars] [-l header lines] [-t descriptive tags] [-x xsl stylesheet] 

 

Where options include:

        -i set an Input filename.

        -o set an Output filename.

        -c set an array of Char delimiters ( '\w' is space). Array treated as individual characters not a string. Chars not present in output.

        -d set an array of characters to Delete (ignore) while parsing. Chars not be present in output.

        -l set the integer number of Lines to place into header tag.

        -t set an array of descriptive Tags. Array must be comma seperated. eg. -t day,yr,mo,bid,ask,

        -x set a Xsl stylesheet and sets a standard iso <?xml> tag.

        -h show this Help.

                       

Eg:

         1. Convert a csv input file and pipe it into output file

        ./csvtoxml -i /var/datafile.csv > output.xml

 

         2. Pipe the contents into csvtoxml and pipe that into output file

        more /var/datafile.csv | ./csvtoxml > output.xml

 

         3. Pipe the contents into csvtoxml and write to output file

        more /var/datafile.csv | ./csvtoxml -o output.xml

 

         4. Pipe the contents into csvtoxml, use descriptive tags, and write to output file

        more /var/datafile.csv | ./csvtoxml -o output.xml -t last,change,bid,ask

 

         5. Pipe the contents into csvtoxml, use descriptive tags, use comma and period as delimiters, and write to output file,

        more /var/datafile.csv | ./csvtoxml -t last,change,bid,ask -c ,. > output.xml

 

 

NOTE:  'csvtoxml -help' will provide the above help on the commandline

 

 

FUNCTION:

 

                        This program converts a single comma-seperated-value (CSV) stream into an xml tagged stream using stdin or file stream as input.

        The resulting xml stream is sent to stdout, an output file may be specified.

        Comma is the default seperator between values, given its popularity in corporate data communication. However, seperating token, and set of tokens) may be specified.

        End of line, \n, is the indicator for a new line of data, although this may be an added option in the future.

       

        This version offers the ability to insert descriptive tags that are provided as an single csv argument string.

        '-t tag1,tag2' will insert <tag1> and <tag2> around the first and second tagged value. It will then revert to alpha tags, decribed below.

       

        Otherwise, the xml tags are incremented <a>, <b>, ,,, <z>. Tags are recycled at z, additional characters are double, triple, quad, quint tags: <aa>,<bb>, <aaa>, etc.

                        NOTE: The xml tag for an empty value, in the case of consecutive delimiters ,, is a closing xml tag, which is valid xml. 

                       

                        The rudimentary & non-descriptive tags, <a>...<zzz> , may be converted to descriptive tags using sed replace sed s/ OR use the -t option

                       

        This version has no facility for converting tokens as xml attributes, although this function may be added at a later date.

 

                        The utility is designed to be used with unix pipes, and utilize the functionality of existing tools.

                        It is possible to pipe with cat, pr, wget, zip, grep, sed, awk for added functionality.

                       

                        A useful combination is 'find' with -exec to execute csvtoxml on multiple files, such as:

                                    find . -type f -name "*.dat" -exec ./csv_to_xml -s 3 -i {} -o {} \;

                       

                        A useful combination for reading csv data directly from a URL is wget piped (or filed into) csvtoxml. The following retreives csv data from

                        finance.yahoo.com, pipes into csvtoxml which then redirects the stdout into SPX_data.xml (Use -o SPX_data.xml with Windows):

                       

                                    wget http://ichart.finance.yahoo.com/table.csv?s=%5EGSPC&d=1&e=5&f=2007&g=d&a=0&b=3&c=1950&ignore=.csv | csvtoxml -l 1 > SPX_data.xml

                                   

                                    -l 1 will place the leading one (1) line, (Date,Open,High,Low,Close,Volume,Adj. Close) into a header <hd></hd> tag.

                                                                       

                        The utility reads character by character and it blocks on the char read. Thus, it will parse a real-time stream of characters, acting a bit like a

                        daemon. It will wait on the character until a final EOF character is read and then exit. This is useful for converting an older CSV stream into the

                        new XML format without major refactoring.

                        It is possible to use the utility, tail -f, in order to read an ongoing real-time stream in UNIX. A real-time file stream may also be used.

 

KNOWN ISSUES:

 

                        The token to quit or end its while loop is the EOF character, which is currently hardcoded as 'EOF' in the loop condition. This may pose a problem for

                        certain OS, and you should insert the proper character for EOF.

                       

                        End of line, or '\n' is considered the token for ending a full line of data, starting a new one and reseting the tags. This is a hardcoded character for commonality sake, although future versions will allow alternative tokens.

                        Carriage return, or '\r' is NOT considered the token for ending a full line of data, and it is ignored in UNIX.

                       

                        There are no arguments required if you use stdin and stdout; hence the application will block on the read() if you invoke it with no stdin and no optional arguments.

                        Thus, it may be used to parse typed or cut/pasted data on the console, allowing for quick testing of the final output. Use an interrupt to kill the process. Will not provide the closing tag.

                       

                        $ csvtoxml

                        <data>

                        07 Jan 135.0 (RFY AG-E),1.80,pc,0,0,0,19659,07 Jan 135.0 (RFY MG-E),8.30,pc,0,0,0,6910

                        <ln><a>07 Jan 135.0 (RFY AG-E)</a><b>1.80</b><c>pc</c><d>0</d><e>0</e><f>0</f><g>19659</g><h>07 Jan 135.0 (RFY MG-E)</h><i>8.30</i><j>pc</j><k>0</k><l>0</l><m>0</m><n>6910</n></ln>

                        ^C

                        $

 

                        The escape sequence '\w' is used to accept a white space character (' ') from the command line, but is converted to a single white space char for character comparison.

                        You may receive the following warning if you compile:

                                                warning: unknown escape sequence '\w'

 

XML/XSL FUNCTION:

 

                        This utility uses non-descriptive xml tags, <a></a> ... <z></z>, yet this does not destroy the fundamental usefulness of xml and associated technologies such as style-sheets and transformation.

                        The function of tag naming is not for recognition by human users but for recognition by computer system. Having descriptive tags, such as </price></last> is of no additional value to a system.

                        The tag and value, <a>5.0</a> is as meaningful as <price>5.0</price> to a system.

                       

           

                        The following xsl code will loop through a xml file produced by csvtoxml, using the non-descriptive tags schema.

                                                           

                                                <xsl:for-each select="data/ln">

 

                                                                        <tr>

                                                                        <th>

                                                                                    <span style="font-weight:bold;color:black">

                                                                                                <xsl:value-of select="a"/>

                                                                                    </span>

                                                                        </th><th>

                                                                                    <xsl:value-of select="b"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="c"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="d"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="e"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="f"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="g"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="e - d"/>  <!-- this is basic math function in xsl -->

                                                                        </th><th>

                                                                                    <span style="font-weight:bold;color:black">

                                                                                                <xsl:value-of select="h"/>

                                                                                    </span>

                                                                        </th><th>

                                                                                    <xsl:value-of select="i"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="j"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="k"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="l"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="m"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="n"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="o"/>

                                                                        </th><th>

                                                                                    <xsl:value-of select="l - k"/>

                                                                        </th>

                                                                        </tr>

                                                                       

                                                </xsl:for-each>

 

 

INSTALLATION

 

                        Follow these steps.

                                    1. Download the latest csvtoxml-*.zip (or .tar) file from sourceforge.net.

                                    2. Unzip (extract) the contained .cpp, README, and other associated xml and xsl examples

                                    3. (OPTIONAL) Compile the source .cpp file using g++, visual c++ or c++ compiler, if you have not downloaded a compiled binary for a certain OS

                                    4. Copy the binary file into your binary path, for example, /usr/local/bin

                                    5. Update your PATH with the location of the csvtoxml binary, if required

                                   

COMPILING (STEP 3 FROM ABOVE)

 

                        This utility uses standard c++ libraries, iostream, fstream, istream, and sstream, and stdlib. The one project header file is csv_to_xml.h.

                        Execute this at your commandline to produce a executable entitled 'csvtoxml'

                                    g++ ./csv_to_xml.cpp -o csvtoxml

 

FUTURE VERSIONS

 

                        May include

                                    an option for providing a string of key codes for descriptive tags.

                                    an option for ignoring full lines in the output xml

                                    an option for parsing attributes using an alternative delimiter

                                    examples of csv data, resulting xml and useful xsl style sheets.

 

 

 

SourceForge.net Logo