Parsing Strings by Using String Tokenizer

A common task when doing network programming is to break a large string down into various constituents. A developer could accomplish this task by using low-level String methods such as index Of and substring to return substrings bounded by certain delimiters. However, the Java platform has a built-in class to simplify this process: the String Tokenizer class. This class isn't specific to network programming (the class is located in java. util, not java.net), but because string processing tends to be a large part of client-server programming, we discuss it here.

The String Tokenizer Class

The idea is that you build a tokenizer from an initial string, then retrieve tokens one at a time with next Token, either based on a set of delimiters defined when the tokenizer was created or as an optional argument to nextToken. You can also see how many tokens are remaining (count Tokens) or simply test whether the number of tokens remaining is nonzero (hasMoreTokens). The most common methods are summarized below.

Constructors

public StringTokenizer(String input)

This constructor builds a tokenizer from the input string, using white space (space, tab, newline, return) as the set of delimiters. The delimiters will not be included as part of the tokens returned.

public StringTokenizer(String input, String delimiters)

This constructor creates a tokenizer from the input string, using the specified delimiters. The delimiters will not be included as part of the tokens returned.

public StringTokenizer(String input, String delimiters, boolean includeDelimiters)

This constructor builds a tokenizer from the input string using the specified delimiters. The delimiters will be included as part of the tokens returned if the third argument is true.

Methods

public String nextToken()

This method returns the next token. The method throws a NoSuchElementException if no characters remain or only delimiter characters remain.

public String nextToken(String delimiters)

This method changes the set of delimiters, then returns the next token. The nextToken method throws a NoSuchElementException if no characters remain or only delimiter characters remain.

public int countTokens()

This method returns the number of tokens remaining, based on the current set of delimiters.

public boolean hasMoreTokens()

This method determines whether any tokens remain, based on the current set of delimiters. Most applications should either check for tokens before calling nextToken or catch a NoSuchElementException when calling next_Token. Note that hasMoreTokens has the side effect of advancing the internal counter, which yields unexpected results when doing the rare but possible sequence of checking hasMoreTokens with one delimiter set, then calling nextToken with another delimiter set.

Example: Interactive Tokenizer

A good way to get a feel for how StringTokenizer works is to try a bunch of test cases. Listing 17.4 gives a simple class that lets you enter an input string and a set of delimiters on the command line and prints the resultant tokens one to a line.

Listing 17.4 TokTest.java

Here is TokTest in action: