taint. Specifying data transformations

^taint[text]
^taint[
transformation type][text]

Parser enables automatic data transformations to protect your system against intrusion and the "default" security level is high. It works even if your code contains no operator taint. If you interfere by using these operators (especially for as-is transformations), you may increase the risk of security vulnerability. Therefore, study the mechanism carefully before writing code.

Operator taint marks the text received as "needing transformation of a certain type". If transformation type is unspecified, taint marks it as "tainted" (needing undefined transformation). Text marked "tainted" is subject to the type of transformation applied to external text (coming from from field, database, file, cookies, etc.).

Text is marked for transformation to be
performed later, when the apply-taint operator is called, the document is outputted to browser, sent to SQL server, saved into a file, sent out through e-mail, etc.

For simplicity you can think about it as if Parser interprets external characters as
^taint[external text], and text within the body as ^taint[optimized-as-is][typed text].

Automatic transformations protect against unsafe external data. For example, an SQL query containing
^string:sql{SELECT name FROM table WHERE uid = '$form:uid'} (again, not using taint) cannot be subverted by SQL injection using parameter "?uid=' OR 1=1 OR '", because Parser escapes the single quotes in the $form:uid received before sending the query to server.

Text within the body is also automatically transformed. Parser optimizes whitespace symbols: space, tabulation characters and line breaks. If these symbols appear in a row, they are replaced with the first one of them. In other words, if you type several spaces, they become only one before viewing. If you need to disable this optimization (for example, when using
<pre/>), do it explicitly by writing, for instance, the following:

<pre>
^taint[as-is][

   I strode off the
   high cathedral
    
top-most step like a
     
miracle worker, or a
      
Blessed
       
passing the final exam for
        
Saint. The
         
city expanded at my
          
feet. For one
           
pico-second, I
            
flew.
]

</pre>



Example
$clean[<br />]
# the above expression is equivalent to this: $clean[^taint[optimized-as-is][
<br />]] 

$tainted[^taint[
<br />]]

Strings: ^if($clean eq $tainted){
match}{do not match}<br />

Tainted data
-'$tainted'<br />
Untainted data
-'$clean'<br />

This example shows that although comparison show that strings are equal, a browser will display different results-the untainted string is not transformed, whereas '<' and '>' in the tainted one are replaced with '&lt;' and '&gt;'.


Example
$city[New York]
<a href="city.html?city=^taint[uri][$city]
">$city</a>

As a result, contents of variable city are transformed into URI type. Cyrillic characters, white spaces and other characters which must be encoded, would be replaced with hex entities and represented as %XX.


Example
Ouputting and saving user submitted data and generating XML<br />
You specify: '$form:field
'

^connect[$SQL.connect-string]{
   ^void:sql{INSERT INTO news SET (body) VALUES ('$form:field')}
}


$doc[^xdoc::create{<?xml version="1.0" encoding="UTF-8"?>
<root>
   
<data>$form:field
</data>
</root>
}]


In this case, you don't need
taint, as all the necessary transformations will occur automatically with transformation type optimized-html for output to browser, sql for sending data to server and xml for generating xdoc object.
Note that you also do not need to write
taint in SQL queries when saving data to a database using administrative interface.


Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form<br />
^if(def $form:body){
   $body[$form:body]
}{
   ^connect[$SQL.connect-string]{
      
$body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
   }
}
<textarea>$body</textarea>

In this example
optimized-html transformation will be performed automatically, because the data submitted by the user or coming from a database are tainted. If the data contains any tags, they will not affect the page. Remember that sequences of white spaces in $body will be optimized during output.


Example
Outputting data coming from a database containing administrator written tags<br />
^connect[$SQL.connect-string]{
  $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
}
^taint[as-is][$body]

Here you should use
taint specifying transformation type as-is, for the tags included in the news code by the administrator need not undergo any transformation. This method must not be used for the data submitted by visitors to the website such as guest book information, forum entries, etc.


Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form keeping spacing symbols<br />
^if(def $form:body){
  $body[$form:body]
}{
  
^connect[$SQL.connect-string]{
     
$body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
  
}
}
<textarea>^taint[html][$body]</textarea>

In this case, use
taint specifying transformation type html to avoid crippling the page and to disable optimization of space characters.


In the above examples operator
taint was used only three times: for displaying administrator added tags in database-derived text, for disabling optimization of spacing symbols, and for outputting query string containing encoded characters (for example, white spaces and Cyrillic letters).
Otherwise, there was no need for
taint, and Parser managed everything on its own.

Remember that it is better not to use this operator unless necessary.


The transformation is replacement of some characters by others, according to built-in transformation tables. The following types of transformation are available:

as-is
file-spec
http-header
mail-header
uri
sql
js
json   
[3.4.1]
parser-code   
[3.4.0]
regex

xml
html

optimized-as-is
optimized-xml
optimized-html

Transformation table
as-is
no transformation
file-spec
characters * ? " < > | are replaced with _XX, where XX is character's hex-code
uri
characters other than numbers or lower/uppercase Latin letters as well as characters _ - . " are replaced with %XX, where XX is a character's hex-code
http-header
the same as URI
mail-header
if charset is known (if not, upper/lowercase will not work), the fragment starting with the eighth-bit first letter and until the end of the string will be represented in such a way:
Subject: Re: parser3: =?koi8-r?Q?=D3=C5=CD=C9=CE=C1=D2?=
sql
depending on SQL-server
for Oracle, ODBC and SQLite ' is replaced with ''
for PgSQL characters
' and \ are prefixed with \
for MySQL characters
' " and \ are prefixed with \, characters with codes 0x00 0x0A 0x0D are replaced with \0 \n \r

for transformation needed that code which made a transformation are located inside ^connect[]{} operator.
js



" is replaced with \" 
' is replaced with \'
\ is replaced with \\
newline character is replaced with \n
character with code 
0xFF is preceded by \
json
characters " \ / are prefixed by \
newline
character is replaced with \n
tab character is replaced with \t
characters with codes
0x08 0x0Ñ 0x0D are replaced with \b \f \r
in case of non-UTF-8 output all unicode characters is replaced with \uXXXX
regex
characters \ ^ $ . [ ] | ( ) ? * + { } - are prefixed by \
parser-code
special characters are prefixed by ^
xml
& is replaced with &amp;
> is replaced with &gt;
< is replaced with &lt;
" is replaced with &quot;
' is replaced with &apos;
html
& is replaced with &amp;
> is replaced with &gt;
< is replaced with &lt;
" is replaced with &quot;
optimized-as-is
optimized-xml
optimized-html



in addition to replacements, optimizes "white spaces" (space, tab, newline characters).

multiple repetition of above-mentioned characters in a row is replaced with a single one-that which goes first in the row

A number of
taint transformations are made automatically. Thus, names of files and paths are always automatically transformed with file-spec and when you write…

^
file::load[filename]

…Parser executes…

^file::load[^taint[file-spec][
filename]]

Similarly, when HTTP-headers and mail headers are defined, Parser executes
http-header and mail-header transformations respectively. During DOM-operations, text parameters of all methods are automatically xml-transformed.

Parser also performs a number of automatic
untaint transformations:
type
what is transformed
sql
body of SQL-query
xml
XML-code-while an object of class xdoc is created
optimized-html
page output to browser
regex
REGEX-patterns
parser-code
body of operator process



Copyright © 1997–2025 Art. Lebedev Studio | http://www.artlebedev.com Last updated: 11.04.2024