back to index

Strings

A String is composed of an arbitrary (limited by memory) number of bytes which are encoded as described by the ASCII standard. The special value 0 denotes the end of a char sequence (ASCIIZ). Currently, there is no support for UNICODE strings but this may change in future versions by using the UTF-8 encoding.

A comprehensive description of the String class can be found here.

Buffered Strings

TKS strings are objects, which, besides the actual char data, also store the number of chars used (length property, includes terminating 0), the maximum number of chars available (bufferLength property) as well as an (internal) ownership flag used for constant strings. By this means it is possible to reserve memory for successive append operations to avoid unnecessary string copies.

Example:

    String sbuf; 
    sbuf.alloc(1024);
    sbuf.empty(); // reset the number of used chars
    sbuf="hello, ";
    sbuf.append("world."); // no buffer resizing necessary

- demonstrates how to override the internal string size prediction

The buffer of a String object may also point to a constant, invariant char sequence; an alterative operation on such a String will automatically create a copy first.

Example:

    String s<="hello"; // set buffer to a constant string
    // the following append operation will automatically create a copy
    // which will be deleted after the statement has been executed.
    trace s+", world";

A comprehensive description of all supported string operations is available in the API reference; only a listing is given here: operator !=(), operator &(), operator &&(), operator >(), operator >=(), operator <(), operator <<(), operator <=(), operator +(), operator ==(), operator [](), operator ||(), alloc(), append(), copy(), empty(), endsWith(), fixLength(), free(), freeStack(), getBufferLength(), getc(), getLength(), getWord(), indexOf(), insert(), isBlank(), lastIndexOf(), load(), loadLocal(), parseXML(), patternMatch(), print(), putc(), replace(), saveLocal(), split(), startsWith(), substring(), toLower(), toUpper(), trim(), words()

The [] operator

The [] operator is used to access single chars of a String. In doing so, one needs to be aware that it is not allowed to access chars beyond the ASCIIZ character (i.e. 0..index<string.length).

Example:

    String s<="hello, world.";
    trace "the 6th char of the string is:\'"+tcchar(s[5])+"\'.";

The function tcchar() is used to convert an ASCII character code (s[5]=44==',') into a printable (2 char long, including ASCIIZ) String (",").

String lists

When working with strings it is often required to split strings into substrings, e.g. while parsing formatted text files. Therefore the string class includes a simple stack mechanism which can attach an arbitrary number of substrings to a String. This string list (or string stack) can be iterated using the foreach statement resp. single substrings can be accessed using the getWord() method.

Example:

    // load a local text file (true=ASCII mode, carriage returns ('\r')
    // are deleted) and split it into lines
    String t,s; s.loadLocal("test.txt", true); 
    s.split('\n');
    foreach t in s trace "line="+t;
    s.freeStack();

    foreach t in s.splitChar('\n') trace "line="+t;

Example:

    // split into words, do not take embedded strings into account:
    String t,s<;="abc def ghi jkl \". . .\""; s.words(false);
    foreach t in s trace "t="+t;
    s.freeStack();
    
    foreach t in s.splitSpace(false) trace "t="+t;

Example:

    // split into words, take embedded strings into account, these
    // are going to be interpreted as one word:
    String t,s<;="abc def ghi jkl \". . .\""; 
    s.words(true);
    foreach t in s trace "t="+t;
    s.freeStack();
    
    foreach t in s.splitSpace(true) trace "t="+t;

Example:

    String s<="one two three"; s.words(1);
    trace "the 2. word is "+s.getWord(1);
    s.freeStack();
    
    trace "the 3. word is "+(s.splitSpace(0)[2]);

The list of a String object has to be freed manually after usage. Successive calls to split() or. words() will not have the desired effect otherwise.

Parsing XML files

Text files are often stored in a machine-readable form in order to allow for automatic processing. XML, the Extensible Markup Language, has evolved into a standard format for these purposes.

XML defines the basic structure and syntax of a text file. Each XML file typically starts with a reference to the Document Type Description (DTD), which an XML parser should use to validate the structural configuration of a document. The DTD describes the set of elements, their attributes and possible values and how the elements may be nested to form complex data structures.

TKS uses a simplified XML parser which only performs a basic syntax check (e.g. <e> must be closed with </e> on the same level). Flow text between elements and DTD validation are not supported.

The String.parseXML() method splits a String into an L/R tree; the left nodes link elements of the same hierarchy level, the right linked nodes lead to subtrees of an element structure. The attribut lists of elements are converted to HashTables (associative arrays). The element structure of the document is converted to TreeNodes, which store the element HashTables, to make it accessible from scripts.

Example:

    String s<="<test><body><text value=\" \'. . .\' <test>\"/></body></test>"; 
    TreeNode t<=s.parseXML();
    TreeNode u<=t.right;
    trace "u.name="+u.name; u<=u.right;
    HashTable r<=u.objectValue;
    trace "text=\""+r["value"]+"\"";

Flow text between a start-tag ("<element>") and an end-tag ("</element>") is accessible in the "<>" attribute. Example:

String xml="<test>&lt;hello, world.&gt;</test>";
TreeNode t<=xml.parseXML();
print "<"+t.name+">.[]=\""+t.objectValue["<>"]+"\"";


back to index

TkScript and the TkScript documentation are (c) Copyright 2001-2004 by Bastian Spiegel.