{ filsa.net: Polyglot Frontier - Introduction | Resource Index | Get Involved }
Polyglot Frontier: promoting multi-lingual support for Frontier
Originally published on 9/25/97; 1:25:46 PM

String Manipulation Problems

Why double-byte text has to be handled differently, and solutions so far.

What's the problem?

Simply, double-text contains characters that are mis-interpreted by Frontier.

The problem characters are [in brackets]:

backslash [\], the chevron [«], the left-brace [{], the atmark [@]. (and I think the quote ["], for glossary entries-5/11/98 PS)

When Frontier sees these characters in the HTML suite, it does special processing. For example, the left brace and the quote trigger macro processing and glossary substitution, the chevron is used to find and delete comments in outline rendering, and the backslash is always used in Frontier to escape the next character.

However, when these problem characters appear in the second byte of double-byte text, Frontier can't distinguish them, and proceeds to process them as it normally would. This destroys the Japanese text.

The Scope of the Problem

This problem affects the html.suite, specifically, html.refGlossary and html.processMacros. It also affects most of the string-handling verbs, or scripts that rely on string-handling verbs.

So it's a big problem.

Attempts have been made to patch the html suite and the string verbs. Here's a chart of the progress.

string verbs: (30) string.addCommas (number) [no need] string.commentDelete (string) [solved by suites.MWU] string.countFields (string, delimiter) [solved by suites.MWU] string.countWords (string) [see note below] string.dateString () [no need?] string.delete (string, index, count) [no need?] string.filledString (string, count) [no need] string.firstSentence (string) [no need] string.firstWord (string) [see note below] string.getWordChar () [see note below] string.hasSuffix (suffix, string) [to work on?] string.hex (number) [no need] string.insert (source, dest, index) [no need?] string.isAlpha (ch) [no need] string.isNumeric (ch) [no need] string.iso8859encode (s) [no need] string.isPunctuation (ch) [no need] string.KBytes (number) [no need] string.lastWord (string) [see note below] string.length (string) [no need] string.lower (string) [no need] string.memAvailString () [no need] string.mid (string, index, count) [no need] string.nthChar (string, index) [to work on?] string.nthField (string, delimiter, index) [solved by suites.MWU] string.nthWord (string, index) [see note below] string.parseHttpArgs (string) [????] string.patternMatch (pattern, string) [no need] string.popLeading (string, character) [almost no need?] string.popTrailing (string, character) [to work on] string.processHtmlMacros (string, flaps, activeurls, claycomp, osacallback) [solved by suites.MWU] string.replace (string, oldString, newString) [no need] string.replaceAll (string, oldString, newString) [no need] string.setWordChar (character) [see note below] string.timeString () [no need?] string.upper (string) [no need] string.urlDecode (string) [no need] string.urlEncode (string) [no need]

A note about words

In the chart above, verbs that deal with counting words point to a note. This is it, from an email from "Nobumi Iyanaga".

The concept of "word" is different in "space delimited" languages like English and other "roman" languages, and the Japanese or other languages, which have no delimiter like that...

Iimori-san has made a good OSAX named TextInfo, with returns the the number of "words" (in any language), the size of one byte text, the size of two byte text, the number of lines, and the size of all the text. But it has no feature like string.nthWord (string, index), etc.

Current Solutions, Future Solutions

Currently, double-byte text in Frontier runs on patches. (namely, suites.MWU).

It works, but a more pleasing and solid solution involves making Frontier recognize and handle double-byte text on it's own. The best place to take care of this is at the kernel level.

Although it will require the time and attention of the Userland development team, once completed it will not have to be dealt with again, on any platform. And it will open up a wider market for Frontier in overseas markets.

Site Outline
Previous | Next
Other filsa.net services: Frontier Scripting | Script Archive

This is part of Phil's Frontier Scripting Site.
Osaka, JAPAN
Copyright © 1996-97, Phil Suh. All Rights Reserved.
http://www.filsa.net/frontier/polyglot/Frontier5/stringverbs.html
This page last built on 6/28/99; 11:09:22 PM