taint. Specifying data transformations

taint. Specifying data transformations

^taint[text]
^taint[transformation type][text]

Parser enables automatic data transformations to protect your system against intrusion and the "default" security level is high. It works even if your code contains no operator taint. If you interfere by using these operators (especially for as-is transformations), you may increase the risk of security vulnerability. Therefore, study the mechanism carefully before writing code.

Operator taint marks the text received as "needing transformation of a certain type". If transformation type is unspecified, taint marks it as "tainted" (needing undefined transformation). Text marked "tainted" is subject to the type of transformation applied to external text (coming from from field, database, file, cookies, etc.).

Text is marked for transformation to be performed later, when the apply-taint operator is called, the document is outputted to browser, sent to SQL server, saved into a file, sent out through e-mail, etc.

For simplicity you can think about it as if Parser interprets external characters as ^taint[external text], and text within the body as ^taint[optimized-as-is][typed text].

Automatic transformations protect against unsafe external data. For example, an SQL query containing ^string:sql{SELECT name FROM table WHERE uid = '$form:uid'} (again, not using taint) cannot be subverted by SQL injection using parameter "?uid=' OR 1=1 OR '", because Parser escapes the single quotes in the $form:uid received before sending the query to server.

Text within the body is also automatically transformed. Parser optimizes whitespace symbols: space, tabulation characters and line breaks. If these symbols appear in a row, they are replaced with the first one of them. In other words, if you type several spaces, they become only one before viewing. If you need to disable this optimization (for example, when using <pre/>), do it explicitly by writing, for instance, the following:

<pre>
^taint[as-is][
   I strode off the
   high cathedral
    top-most step like a
     miracle worker, or a
      Blessed
       passing the final exam for
        Saint. The
         city expanded at my
          feet. For one
           pico-second, I
            flew.
]
</pre>

Example
$clean[<br />]
# the above expression is equivalent to this: $clean[^taint[optimized-as-is][<br />]]

$tainted[^taint[<br />]]

Strings: ^if($clean eq $tainted){match}{do not match}<br />

Tainted data-'$tainted'<br />
Untainted data-'$clean'<br />

This example shows that although comparison show that strings are equal, a browser will display different results-the untainted string is not transformed, whereas '<' and '>' in the tainted one are replaced with '<' and '>'.

Example
$city[New York]
<a href="city.html?city=^taint[uri][$city]">$city</a>

As a result, contents of variable city are transformed into URI type. Cyrillic characters, white spaces and other characters which must be encoded, would be replaced with hex entities and represented as %XX.

Example
Ouputting and saving user submitted data and generating XML<br />
You specify: '$form:field'

^connect[$SQL.connect-string]{
   ^void:sql{INSERT INTO news SET (body) VALUES ('$form:field')}
}

$doc[^xdoc::create{<?xml version="1.0" encoding="UTF-8"?>
<root>
   <data>$form:field</data>
</root>
}]

In this case, you don't need taint, as all the necessary transformations will occur automatically with transformation type optimized-html for output to browser, sql for sending data to server and xml for generating xdoc object.
Note that you also do not need to write taint in SQL queries when saving data to a database using administrative interface.

Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form<br />
^if(def $form:body){
   $body[$form:body]
}{
   ^connect[$SQL.connect-string]{
      $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
   }
}
<textarea>$body</textarea>

In this example optimized-html transformation will be performed automatically, because the data submitted by the user or coming from a database are tainted. If the data contains any tags, they will not affect the page. Remember that sequences of white spaces in $body will be optimized during output.

Example
Outputting data coming from a database containing administrator written tags<br />
^connect[$SQL.connect-string]{
  $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
}
^taint[as-is][$body]

Here you should use taint specifying transformation type as-is, for the tags included in the news code by the administrator need not undergo any transformation. This method must not be used for the data submitted by visitors to the website such as guest book information, forum entries, etc.

Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form keeping spacing symbols<br />
^if(def $form:body){
  $body[$form:body]
}{
  ^connect[$SQL.connect-string]{
     $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
  }
}
<textarea>^taint[html][$body]</textarea>

In this case, use taint specifying transformation type html to avoid crippling the page and to disable optimization of space characters.

In the above examples operator taint was used only three times: for displaying administrator added tags in database-derived text, for disabling optimization of spacing symbols, and for outputting query string containing encoded characters (for example, white spaces and Cyrillic letters).
Otherwise, there was no need for taint, and Parser managed everything on its own.

Remember that it is better not to use this operator unless necessary.

The transformation is replacement of some characters by others, according to built-in transformation tables. The following types of transformation are available:

as-is
file-spec
http-header
mail-header
uri
sql
js
json   [3.4.1]
parser-code   [3.4.0]
regex
xml
html

optimized-as-is
optimized-xml
optimized-html

Transformation table

as-is	no transformation
file-spec	characters * ? " < > \| are replaced with _XX, where XX is character's hex-code
uri	characters other than numbers or lower/uppercase Latin letters as well as characters _ - . " are replaced with %XX, where XX is a character's hex-code
http-header	the same as URI
mail-header	if charset is known (if not, upper/lowercase will not work), the fragment starting with the eighth-bit first letter and until the end of the string will be represented in such a way: Subject: Re: parser3: =?koi8-r?Q?=D3=C5=CD=C9=CE=C1=D2?=
sql	depending on SQL-server for Oracle, ODBC and SQLite ' is replaced with '' for PgSQL characters ' and \ are prefixed with \ for MySQL characters ' " and \ are prefixed with \, characters with codes 0x00 0x0A 0x0D are replaced with \0 \n \r for transformation needed that code which made a transformation are located inside ^connect[]{} operator.
js	" is replaced with \" ' is replaced with \' \ is replaced with \\ newline character is replaced with \n character with code 0xFF is preceded by \
json	characters " \ / are prefixed by \ newline character is replaced with \n tab character is replaced with \t characters with codes 0x08 0x0Ñ 0x0D are replaced with \b \f \r in case of non-UTF-8 output all unicode characters is replaced with \uXXXX
regex	characters *\ ^ $ . [ ] \| ( ) ? + { } -** are prefixed by \
parser-code	special characters are prefixed by ^
xml	& is replaced with & > is replaced with > < is replaced with < " is replaced with " ' is replaced with '
html	& is replaced with & > is replaced with > < is replaced with < " is replaced with "
optimized-as-is optimized-xml optimized-html	in addition to replacements, optimizes "white spaces" (space, tab, newline characters). multiple repetition of above-mentioned characters in a row is replaced with a single one-that which goes first in the row

A number of taint transformations are made automatically. Thus, names of files and paths are always automatically transformed with file-spec and when you write…

^file::load[filename]

…Parser executes…

^file::load[^taint[file-spec][filename]]

Similarly, when HTTP-headers and mail headers are defined, Parser executes http-header and mail-header transformations respectively. During DOM-operations, text parameters of all methods are automatically xml-transformed.

Parser also performs a number of automatic untaint transformations:
type
what is transformed

sql
body of SQL-query

xml
XML-code-while an object of class xdoc is created

optimized-html
page output to browser

regex
REGEX-patterns

parser-code
body of operator process

Last updated: 11.04.2024

type	what is transformed
sql	body of SQL-query
xml	XML-code-while an object of class xdoc is created
optimized-html	page output to browser
regex	REGEX-patterns
parser-code	body of operator process