Theory of Operation
The CQLi tool accepts a PGN file and a CQL query as input and outputs the games from the PGN file that match the provided query. A game matches the query if any positions appearing in the game match the query. A CQL query consists of one or more filters.
For each game processed by CQLi, a game tree is constructed to represent the game and any variations present. Every position in a game has a unique sequential position id starting at 0 for the initial position. Each position in the game is visited once, starting at the initial position and progressing through the positions corresponding to the moves made in the game in order of increasing position id. For each position, the supplied CQL query is evaluated. Each filter in the CQL query is evaluated at the current position and either matches the position or not. As soon as a filter fails to match, evaluation of the current position stops and the query is then applied to the next position. A position matches if all of the filters in the query succeed for the position. After all positions in the game have been evaluated, the game matches the query if at least one position in the game matched.
Each game that matches the query is saved until processing of all games is complete. When all games have been processed, matching games are sorted according to any provided sort criteria, comments are added for matching positions, tag modifications are applied, and the resulting games are written to the output file.
Running CQLi
CQLi is a command line application which can be run from a shell or command prompt or another program such as a GUI interface or a batch script. CQLi accepts many options that affect the behavior such as how many threads to use, PGN output formatting options, and options that allow queries to be specified or modified on the command line. The full set of available options are documented in Commandline Options. The most important options are summarized below.
The -i option is used to specify the name of the input PGN file which contains the games that will be queried. The -o option is used to specify the name of the output PGN file, i.e. the file to which CQLi will write matching games. The output file is truncated first if it already exists. If no -o option is specified, matching games will be written to a file with a name constructed from the provided CQL query file as explained in the description of the -o option. The -i and -o options both require exactly one argument: the name of the respective file. Options and their corresponding arguments are separated by spaces. The final argument of a typical CQLi invocation is the name of the file that contains the CQL query to be evaluated. For example:
cqli -i input.pgn -o output.pgn test.cql
will process the query contained in test.cql for each game in the input file input.pgn writing any matching games to output.pgn. If the provided query file name cannot be found in the current directory or any directories specified by the CL_PATH environment variable, and does not end with an extension, CQLi will attempt to locate the file with the extension .cql appended.
The text of a CQL query may also be specified directly on the command line using the -cql option, for example the command:
cqli -i input.pgn -o output.pgn -cql 'mate parent : check'
will find games where check was answered by checkmate. When specifying a query with -cql that contains spaces or characters special to the shell, the argument to -cql must be quoted, typically by surrounding the entire argument with single quotes.
A Brief Tour of CQL
This section introduces CQL through a series of short examples aimed at the first-time user. Each example adds one new idea to the previous one: finding checkmate, combining filters, using or and not, describing pieces on squares, expressing attack relationships, comparing material, searching both colors with a transform, and adding comments to matching positions. Detailed explanations of every feature used here can be found in later sections of this manual.
Finding Checkmates
The simplest useful CQL query consists of a single word:
mateThis query uses the mate filter to find every game containing a checkmate position. CQL queries are evaluated at each position of every game, and a game matches if any position matches the query. To try this query directly from the command line, invoke CQLi as follows:
cqli -i games.pgn -o results.pgn -cql 'mate'
The same query may also be saved in a file called checkmates.cql and run with:
cqli -i games.pgn -o results.pgn checkmates.cql
CQLi will scan every position in every game in games.pgn and write the games that contain a checkmate to results.pgn.
Combining Filters
A CQL query can contain multiple filters. A position matches only when all filters match. For example, to find games where White delivers checkmate, use the query:
btm
mateThe btm filter matches positions where it is Black to move. Combined with mate, this matches checkmate positions where it is Black’s turn, meaning White just delivered the checkmate. When a query contains multiple filters, the query is only considered to match when all of the filters match the same position. The line breaks are only for readability.
Using or and not
Sometimes a query should match one of several alternatives rather than requiring every filter to match at once. For example:
mate or stalemateThis query finds games containing either a checkmate position or a stalemate position. The or filter matches when either operand matches.
To exclude a condition, use the not filter:
check
not mateThis matches positions where the side to move is in check but not checkmated. As previously noted, all filters in the query must match the same position for that position to be considered a match.
Describing Pieces on Squares
Piece designators provide a mechanism to express the arrangement of pieces on the board. For example, the query:
mate
Qh7
kg8finds checkmate positions where a white queen is on h7 and the black king is on g8. The filter Qh7 means “a white queen on h7” and kg8 means “a black king on g8”. Combined with mate, this shows how a CQL query can describe a concrete board pattern.
Describing Attacks
Attack relationships are another common way to describe a position. For example, the query:
k attackedby Qfinds positions where the black king is attacked by a white queen. More precisely, the expression X attackedby Y means “the squares in X that are attacked by the pieces in Y”. Since k refers to the square occupied by the black king, this query matches when that square is attacked by a white queen.
This can be combined with other filters just like the earlier examples:
btm
k attackedby QThis matches positions where it is Black to move and the black king is attacked by a white queen. In standard chess, that means Black is in check from a queen. The next example further refines the pattern by adding a material condition.
Inspecting Material
The power filter calculates the total material value of the pieces on a given set of squares (pawns = 1, knights/bishops = 3, rooks = 5, queens = 9). Capital letters represent white pieces and lowercase letters represent black pieces. A represents all white pieces and a represents all black pieces.
The following query finds checkmates where Black had a large material advantage:
btm
mate
power a - power A >= 8The expression power a - power A >= 8 computes the difference in material between Black and White and matches when Black is ahead by at least 8 points (roughly a rook and a bishop). This can be read as: “Black has at least eight more points of material than White.” Combined with the mate filter, this finds games where White won by checkmate despite being significantly behind in material.
Using Transforms
Many search patterns apply equally to both sides. Rather than writing two separate queries, the flipcolor filter automatically searches for the pattern as written and with colors reversed:
flipcolor {
btm
mate
power a - power A >= 8
}Here the braces group the earlier filters so that flipcolor applies to the whole pattern. This single query finds checkmates by either side while behind in material. Without flipcolor, only checkmates by White would be found.
Adding Comments to Matches
By default, CQLi marks matching positions with a comment in the output PGN. The comment filter lets you add custom annotations. For example:
flipcolor {
btm
mate
power a - power A >= 8
comment "Checkmate despite material deficit"
}This is the same search as above, but with a custom comment added to each matching position. The matching games in the output file will contain the comment Checkmate despite material deficit at the position where the checkmate occurs, making it easy to navigate to the relevant position in a chess GUI.
Where to Go from Here
The examples above only scratch the surface of what can be expressed with CQL queries. The remainder of this manual describes the full set of filters and features available, including:
- Piece Designators for specifying which pieces and squares to examine.
- The
attackedbyandattacksfilters for expressing attack relationships. - The move filter for inspecting individual moves, including captures, promotions, and castling.
- The line filter for finding complex move sequences using repetition patterns borrowed from regular expressions.
- Transform Filters for finding patterns regardless of board orientation, color, or location.
- Imaginary Position Exploration for analyzing positions that were never actually reached in the game.
- The Synoptic Examples and Expository Examples sections which contain a collection of practical queries and detailed walkthroughs.
Basic Concepts
Source Comments
CQL supports two types of comments: block comments and line comments. Block comments are introduced by the character sequence /* and terminated by the sequence */. Block comments do not nest. Line comments begin with the sequence // and continue until the end of the line. Comments may appear anywhere between tokens in a CQL query and are removed during parsing.
Filters and Types
Filter is the term used to refer to any component of a CQL query that is evaluated. Function definitions, function calls, variable assignment, operators, operands, intrinsic operations, looping constructs and even literal values are all filters.
Every filter has a static type and evaluation of the filter will either yield a value of that type, or the special None value which represents the absence of a value.
There are several distinct types in CQLi:
- Boolean - can represent the values
trueandfalse - String - represents a string of UTF-8 characters
- Numeric - represents a 64-bit integral value
- Set - represents a set of zero or more chessboard squares
- Position - represents a specific position in a game
- Piece Identity - represents the identity of a piece (see Piece Tracking)
- Dictionary - a collection of key/value pairs
A value is said to “match the position”, or simply “match”, if it is not the special None value, the false Boolean value, or an empty Set value. This is separate from the question of whether a filter produced a value at all: a filter may yield a value and still not match. In particular:
Nonedoes not match.falsedoes not match.[]does not match.0does match.""does match.
This distinction is a key concept in CQL. For example, comparison, assignment, and string conversion filters may all yield values that do not match the position.
Literals
A literal is a single constant value that is known at parse time. Literals are supported for the Boolean, String, Numeric, and Set types. The literal Boolean values are true and false. String literals are any text (except for double quotes) surrounded by double quote characters. Numeric literals consist of one or more digits. Set literals are specified using square designators such as a1, a-h1-2, or [a1-6,d5,f6]. The special designator . represents all squares. The shortest CQL query is just . which matches every position of every game and is functionally equivalent to a CQL query of true.
Variables
Variables may hold values of any type except for Boolean. The type of a variable is static, the value initially assigned to a variable determines its permanent type (except for piece variables and dictionary variables which are declared using separate keywords). The names of variables may consist of letters, digits, underscores and the $ character and may not begin with a digit. There is no limit to the length of the name of a variable and all characters are significant.
A variable may not have the same name as a keyword or a sequence that would represent a piece designator. It is good practice to start variable names with either an uppercase letter or a $ to prevent accidental collision with reserved names and to serve as a visual cue to differentiate variables from other names.
Variable names are case sensitive so var, Var, and VAR refer to three distinct names. Names starting with __CQL are reserved by the CQL implementation.
Variables are assigned a value using the = filter. Assignment is how variables are declared in CQL, the type of the variable is inferred from the value initially assigned to it. It is a syntax error to reference a variable before it is declared. This rule applies at the point where the reference is actually processed. In particular, function bodies are checked when the function is invoked, not when it is defined.
The assignment filter (=) yields a Boolean value of true unless the value to be assigned is None in which case the value of the variable is unchanged and the result of assignment is false. For example, the query:
$check_pos = find checkwill find the first position, starting with the current position, where one side is in check and assign this position to $check_pos. If no such position is found, the find filter yields None, $check_pos is not assigned, and the assignment itself does not match the position.
Conditional Set Assignment
An empty set value (e.g. []) does not match the position but it can still be assigned to a variable, and a plain assignment of [] will itself match the position. The =? conditional set assignment filter may be used to assign a Set variable only if the provided value is not the empty set. In other words, = stores empty sets while =? skips them.
For example:
X = [] // X holds the empty set
X = a1 // X holds the value a1
X =? [] // X is not modifiedThe =? filter yields true if the variable was assigned (possibly with the same value it already held) and false otherwise. When =? yields false, the variable is left unchanged. This matters when =? appears as one step in a sequence of filters: the assignment may leave the variable unchanged and yet still cause the enclosing sequence to fail to match.
Compound Assignment
Variables can be modified using the simple assignment filter = or the compound assignment filters +=, -=, *=, /=, %=, |=, and &=. Like simple assignment, if the RHS of a compound assignment filter is None, the variable is not modified and the assignment does not match the position.
Unbound Variables
An unbound variable is one whose value is None, i.e. it does not hold a value. A variable of any type except Dictionary type may be unbound. This can occur if the variable has not yet been assigned such as if the initializing declaration is skipped. For example:
if 1 == 2 then X = 1
str(X) // yields the string "<None>", X is unboundA variable may be explicitly unbound using the unbind filter which takes the name of a previously declared variable as its only argument, e.g.:
unbind XThe isbound filter takes a single identifier as an argument and yields true if the identifier corresponds to a bound variable and false otherwise (the specified identifier refers to an unbound variable or does not refer to a variable at all). The isunbound filter operates similarly but yields false if the provided identifier is a bound variable and true otherwise. For example:
X = 1
Y = 1
unbind Y
isbound X // true
isbound Y // false
isbound Z // false
isunbound X // false
isunbound Y // true
isunbound Z // trueDictionary Variables
A dictionary holds a set of key-value pairs. The possible types for keys and values are Numeric, String, and Set. The types of the keys and values are set when the dictionary is declared. A dictionary variable is declared using the dictionary keyword, no initializer is provided with the declaration. An optional type specifier of the form:
{
str|int|set}-->{str|int|set}
may be provided where the type indicated on the left of the arrow specifies the key type and the type indicated on the right of the arrow specifies the value type.
For example, the following filter declares a dictionary variable named Dx with keys of Numeric type and values of Set type:
dictionary int --> set DxIf the optional type specifier is not present, the dictionary will have keys and values of type String. Dictionary variables are persistent, their values are maintained across games.
Dictionary Entry Access and Assignment
Given a dictionary variable Dx, the key access filter Dx[key] yields the value of key stored in Dx or None if key does not exist in Dx. Accessing a missing key never creates it.
The key assignment filter Dx[key]= value inserts key into Dx with the value value, if neither key nor value is None, or replaces the value for key if key already exists. If key or value is None, the key assignment filter does not match the position and Dx is not modified.
In particular, note that:
Dx[key]on a missing key yieldsNoneand does not create an entry.Dx[key] =valueinserts or replaces an entry unlesskeyorvalueisNone.Dx[key] =?value(for Set-valued dictionaries) inserts or replaces an entry only ifvalueis a non-empty set.Dx[key] += ...and the other compound assignment operators may create a missing key first; see below.
Compound Assignment and Key Autovivification
The compound assignment operators (+=, -=, *=, /=, %=, |=, and &=) may be used with dictionaries that have appropriate value types. If the specified key does not exist in the dictionary, it will first be created (vivified) with a default value (0 for Numeric values, [] for Set values, and "" for String values) unless the filter will evaluate to false (which occurs when the right-hand side of the compound assignment is None or the right-hand side of /= or %= is zero). For example, Dx[key] += 1 will increment the value of Dx[key], if key did not exist in Dx it will first be added with a value of 0 and then incremented to 1.
This autovivification of entries only occurs when using the compound assignment operators. Simply attempting to access a key that does not exist with Dx[key] will not create a corresponding entry in Dx, and neither will a failing simple assignment or conditional set assignment.
To increment the value of a key that already exists in a dictionary, without creating it if it does not exist, use if Dx[key] then Dx[key] += 1 or Dx[key] = Dx[key] + 1 instead.
Conditional Set Assignment with Dictionaries
The conditional set assignment operator =? can be used with dictionaries having Set values with the same semantics that are employed for regular variables. E.g. Dx[key]=? value will assign value to Dx[key] only if value is not the empty set. If value is empty and key does not exist in Dx, it is not created with a default value.
Removing Keys from Dictionaries
An unbind filter of the form unbind Dx[key] will remove key from Dx. If unbind is applied to a dictionary variable, e.g. unbind Dx, all of the keys are removed from the variable (but the variable itself is not unbound, dictionary variables never have a value of None).
Dictionary Cardinality and Iteration
The number of keys in a dictionary can be obtained using the cardinality filter #, e.g. #Dx will yield the number of key-value pairs present in Dx. The keys in a dictionary may be iterated using the key iteration filter.
Dictionary Access Caveats
For dictionary variables having keys of Set type, care must be taken when attempting to use an unbracketed piece designator as the key in a dictionary access or assignment filter. The parser may otherwise interpret the text as a piece designator instead of a dictionary key.
For example, Dx[a2] += 1 will produce a syntax error because it is interpreted as two separate filters: Dx and [a2] += 1. To force dictionary-key parsing, one of the following disambiguation techniques may be used:
- Enclose the piece type and/or square designator portions in square brackets. For example, instead of
Dx[Ka1-8]useDx[[K]a1-8],Dx[K[a1-8]], orDx[[K][a1-8]]. This prevents the outside bracketed expression from being confused with a piece designator (since nested brackets are not permitted in piece designators). - Add a space character before and/or after the piece designator. For example, instead of
Dx[Ka1-8]useDx[ Ka1-8 ]. - Prefix the piece designator with the identity operator. For example, instead of
Dx[Ka1-8]useDx[`Ka1-8].
In short, for a Set-keyed dictionary:
Dx[a2]is ambiguous and should be avoided.Dx[ a2 ],Dx[`a2], andDx[[a2]]are unambiguous dictionary accesses.
Similarly, if a variable that would be interpreted as a compound piece type designator when enclosed by square brackets without spaces (i.e. one whose name consists entirely of the characters in the set QqBbRrNnKkPpAa_), one of the last two disambiguation techniques above needs to be taken when using the variable as a key in a dictionary access or assignment filter. For example, instead of Dx[Rank] use Dx[ Rank ] or Dx[`Rank].
Persistent Variables
The values of variables are reset to None at the beginning of every game unless declared with the persistent keyword (which immediately precedes the variable name) in which case their value persists across all games. Numeric, Set, and String variables may be declared as persistent. Dictionary variables are always persistent but may not be declared using the persistent keyword. Persistent variables are initialized once, prior to evaluating the first position in the first game. Persistent Numeric variables are initialized to zero, Set variables to the empty set, and String variables to the empty string. Persistent variables may be declared using a compound assignment operator, the type of the variable will be deduced from the RHS of the compound assignment. For example:
persistent $total_positions += 1declares a persistent Numeric variable named $total_positions that is incremented for every evaluated position. Persistent variables may be unbound using the unbind filter in which case they are never implicitly re-initialized.
The final values of non-dictionary persistent variables are emitted at the end of processing. The option --showdictionaries may be used to cause dictionary variables to be emitted along with other persistent variables.
Persistent and dictionary variables may be declared using the quiet keyword to suppress emission of their value after processing, this may be used to suppress emission of possibly very long string variables. E.g.:
persistent quiet $str = ""Variable Scopes
Every variable exists within a scope which dictates the portion of the query for which the variable is visible (may be referenced). In CQLi, all variables are placed in either the global scope or a block scope. Block scopes are introduced by function invocations and iteration filters (echo, loop, piece, square, string, and while) and extend to the end of the respective filter.
Names appearing in a function body are resolved when the function is invoked. However, the usual rule still applies at that point: every referenced variable must already have been declared before the reference is processed.
Persistent variables and variables declared outside of a function or iteration filter exist in the global scope. Parameter variables (parameters declared in a function parameter list and the iteration variables of piece, string, echo, and square filters) always exist within their corresponding block scope, shadowing any variable of the same name in an enclosing scope.
Other (non-persistent, non-parameter) variables used in a block scope either refer to a variable in an enclosing scope or create a new variable in the current block scope. While variables within an enclosing scope may be accessed in a block scope, variables created in a block scope may not be accessed after the end of that scope. Block scopes may be nested and all block scopes are enclosed by the global scope.
The following example illustrates the key points described above:
$param = "abc" // $param created in global scope.
$called = 0 // $called created in global scope.
function foo($param) { // $param shadows variable in enclosing scope.
$called = 1 // Refers to the above $called variable.
$local = abs $param // $local is not accessible outside of foo.
$local + $global // $global will be accessible at invocation.
}
$global = 10 // $global resides in the global scope.
$result = foo(5) // foo can access $global from here.The first two lines declare a String variable named $param and a Numeric variable named $called, both in the global scope. Function foo declares a parameter variable $param, a block scope variable which will shadow the global variable $param within the function invocation. The body of the function is not processed until the function is invoked. When this happens at the end of the example, the first line of the body of foo modifies the global variable $called, it does not create a new variable because the enclosing scope already has a variable named $called and within foo the variable is not a parameter variable.
Since there is not already a variable named $local at the point where foo is invoked, the $local variable will be installed in foo’s block scope and cannot be referenced outside of the function. The variable $global does not exist when foo is defined but does exist when foo is invoked so the reference in foo to $global will be that of the existing global scope variable. Note that if foo was called before $global had been declared a parse error would result. In other words, a function body may refer to a global declared later in the file, but only if that declaration has been processed before the function is invoked.
Piece Designators
Piece designators are used to identify squares on a chessboard such as the squares where certain pieces reside. The piece designator . represents all 64 squares while the piece designator [] represents none of the squares, i.e. it is the complement of the . piece designator. All other piece designators consist of a piece type designator and/or a square designator.
Piece Type Designators
A simple piece type designator specifies a single class of chess pieces. The table below lists the simple piece type designators used in CQL.
| Designator | Description | Designator | Description |
|---|---|---|---|
K |
White king | k |
Black king |
Q |
White queen | q |
Black queen |
R |
White rook | r |
Black rook |
B |
White bishop | b |
Black bishop |
N |
White knight | n |
Black knight |
P |
White pawn | p |
Black pawn |
A |
Any white piece | a |
Any black piece |
_ |
Unoccupied |
For example, the piece designator Q represents the squares on which a white queen resides in the position currently being evaluated. Multiple simple piece type designators may be combined to form a compound piece type designator which consists of one or more simple piece type designators enclosed in square brackets. For example, [Qq] represents squares occupied by white and black queens and [_a] represents the set of squares not occupied by white pieces, in the current position.
Square Designators
A square designator refers to specific squares on a chessboard by their files and/or ranks. A simple square designator consists of a file designator followed by a rank designator. A file designator is either a file name or two file names separated by a hyphen. A rank designator is either a rank name or two rank names separated by a hyphen. Valid file names are a, b, c, d, e, f, g, and h. Valid rank names are 1, 2, 3, 4, 5, 6, 7, and 8.
Examples of simple square designators include a1, e5, a-h8 (the squares on the 8th rank), and d-e4-5 (the four center squares). Note that a simple square designator must consist of both a file designator and a rank designator; e.g. b is a piece type designator, not a square designator. Use b1-8 to refer to all the squares in the b file.
Multiple simple square designators may be combined to form a compound square designator which consists of one or more simple square designators, separated by commas, enclosed in square brackets. For example, [a1,h8] represents the squares a1 and h8, while [a1-8,a-h8,d4] represents the union of d4, the a file, and rank 8.
Combining Piece Type and Square Designators
A piece designator may consist of both a piece type designator and a square designator, either of which may be a compound designator, in which the piece type designator precedes the square designator. For example, Ra-h8 represents the squares occupied by white rooks on the 8th rank and [Kk][a1,a8,h1,h8] represents corner squares occupied by a king of either color.
A piece designator represents the squares that are given by the intersection of its piece type designator and its square designator; if either one is missing, all squares are implied for the missing component. For example, Ra1 is equivalent to R & a1. Compound designators represent the union of each simple designator in the bracketed list, e.g. [a1-8,a-h8,d4] is equivalent to a1-8 | a-h8 | d4 and [Kk][a1,a8,h1,h8] is equivalent to (K | k) & (a1 | a8 | h1 | h8) (except when appearing in shift transforms).
Piece designators are a cornerstone of the CQL language and provide a concise mechanism to articulate a set of squares using both the static nature of the chess board (with square designators) and the dynamic nature of piece occupancy (using piece type designators).
Arithmetic Operators
CQL provides the following arithmetic operator filters for operating on numeric types:
| Description | Example | Result | |
|---|---|---|---|
+ |
Addition | 4 + 5 |
9 |
- |
Subtraction | 12 - 7 |
5 |
* |
Multiplication | 5 * 5 |
25 |
/ |
Integer Division | 10 / 3 |
3 |
% |
Modulus | 10 % 3 |
1 |
Each of the above operators is a binary infix filter that accepts two numeric arguments and yields a numeric result. The result matches the position unless one of the operands does not match the position or a value of zero is used as the right-hand argument to / or %. Note that division yields only the integral portion of the quotient as CQL does not have a fractional type.
Arithmetic Intrinsics
The following arithmetic intrinsic filters are available:
| Filter | Description | Example | Result |
|---|---|---|---|
abs |
Absolute value | abs -10 |
10 |
max |
Maximum of multiple values | max(4 2 7) |
7 |
min |
Minimum of multiple values | min(-5 -2) |
-5 |
sqrt |
Square root | sqrt 10 |
3 |
The sqrt filter does not match the position if its argument has a negative value. Otherwise, all of the above filters match the position if at least one of their operands does. The abs and sqrt filters take exactly one argument which does not need to be parenthesized. The max and min filters require at least two arguments and the argument list must be parenthesized.
The result of the abs filter is the absolute value of its argument. The result of the sqrt filter is the integer portion of the square root of its argument. The min and max filters yield the smallest or largest value, respectively, of their argument list, ignoring arguments that do not match the position.
Comparison Filters
Comparison filters can be used to compare Numeric, Set, String, and Position filters.
| Filter | Description | Example | Result |
|---|---|---|---|
== |
Test for equality | a1 == flipcolor a8"abc" == "ABC" |
a1None |
!= |
Test for inequality | "a" != "ab"10 != abs -10 |
truefalse |
< |
Less than | 10 < 2010 < 5 |
10None |
<= |
Less than or equal to | 10 <= 102 <= 1 |
10None |
> |
Greater than | 20 > 10"a" > "ab" |
20None |
>= |
Greater than or equal to | 20 >= 20"ab" >= "a" |
20"ab" |
All of the comparison filters may be used with two Numeric, String, or Position operands, or with one Numeric operand and one Set operand in which case the Set operand is implicitly converted to a Numeric value that represents the set’s cardinality (the number of squares in the set). The == and != filters may additionally be used with two Set arguments.
The ==, <=, <, >=, and > filters have the same type as their operands and yield the value of the left-hand operand if the corresponding comparison holds and None if it does not. As the comparison filters are also right-to-left associative, they may be chained to express an n-ary relationship. For example $a < $b < $c will yield the value of $a if $b has a value between $a and $c and None otherwise. These filters always yield None if one of their operands is None.
When used with Numeric operands, the standard mathematical relationships are applied. When used with String operands, a Unicode-aware collated comparison is performed. When used with Position operands, checks for an ancestral relationship between the positions.
The != filter yields a Boolean value and X != Y is equivalent to not X == Y for any X and Y. In particular, if X or Y is None, the result of X != Y will be true, even if both X and Y are None.
Logical Operators
CQL provides the and, or, and not logical operator filters. The and and or filters are binary infix operators and not is a prefix operator. and matches the position if both its LHS and RHS operands match the position and or matches the position if either of its operands match the position. The not filter matches the position if its operand does not. The and and or filters employ short-circuited evaluation, i.e. the RHS of and is not evaluated if the LHS does not match the position and the RHS of or is not evaluated if the LHS matches the position.
Set Operators
CQL provides the following set operator filters:
| Description | Example | Result | |
|---|---|---|---|
| |
Set union | a1 | a8 |
[a1,a8] |
& |
Set intersection | a1-8 & a-h3-4 |
[a3,a4] |
~ |
Set complement | ~. |
[] |
# |
Set cardinality | #[c-f3-6] |
16 |
in |
Set inclusion | c3 in [a1,b2,c3] |
true |
The | and & filters are binary infix filters that accept two set filter arguments A and B. A | B yields the union of sets A and B (A ∪ B or the set of squares that are present in either A or B) and A & B yields the intersection of sets A and B (A ∩ B or the set of squares that are present in both A and B). The ~ and # filters are unary prefix filters that accept one set filter argument. ~A yields a set containing the squares not present in A and #A yields a numeric value representing the number of squares present in A. Because the filter . yields a set of all the squares, ~. yields the empty set.
The in binary infix filter accepts two set filter arguments. A in B yields true if A is a subset of B (A ⊆ B or every square in A is also in B) and false otherwise and is equivalent to A & B == A. The query A in B and A != B can be used to determine if A is a proper subset of B (A ⊂ B, or A is a non-identical subset of B). Note that A in B is always true when A is the empty set which should be considered when A could potentially be empty. In particular, use caution when a piece variable is used on the LHS of the in filter. Piece variables are automatically converted to a set representing the location of the piece but this set could be empty if the piece is not currently on the board (i.e. it was previously captured). For example, the intention of the query below is to find all positions where a white pawn has just promoted:
initial
piece Pawn in P {
find Pawn in a-h8
}However, it also matches positions where a white pawn was captured because Pawn converts to the empty set in such positions, causing Pawn in a-h8 to evaluate to true. In this case, the correct solution is to use &:
initial
piece Pawn in P {
find Pawn & a-h8
}so that when Pawn is empty, the result of Pawn & a-h8 will be the empty set which will not match the position.
Other Set Operations
CQL does not provide a set difference operator but the set difference A ∖ B (the squares in A that do not exist in B) can be calculated using A & ~B. The symmetric difference or disjunctive union (aka exclusive OR or XOR) A △ B is the set of squares that exist in exactly one of the sets and can be calculated with either (A & ~B) | (B & ~A) or (A | B) & ~(A & B).
Position Operators
With-position Filter
The with-position filter (:) is a binary infix filter taking a LHS Position operand and an arbitrary RHS filter. When the with-position filter is evaluated, the current position is set to the position specified by the LHS operand, the RHS filter is evaluated, and the current position is then restored. The result of the with-position filter has the type and value of the result of evaluating the RHS filter unless the LHS operand does not refer to a valid position in which case the filter yields None.
It is helpful to think of X:Y as “evaluate Y as if the current position were X”. If X does not name a real position in the current game, nothing on the RHS is evaluated and the entire filter yields None.
The with-position filter has several common use cases including accessing the source position in an echo filter and accessing previously saved positions, including imaginary positions.
Positional Intersection
When the & operator is used with two Position operands, the result is the positional intersection of the positions. The positional intersection of two positions is a set that represents the squares which are either empty in both positions or occupied by pieces of the same type and color.
For example, the query:
echo (source target) {
differences = ~(source & target)
differences > 1
differences == A & source:_
}will find positions that are identical except that White is missing two or more pieces in one of the positions compared to the other. The filter ~(source & target) yields the squares that are not occupied by the same pieces in both positions. The filter A & source:_ uses set intersection to identify the squares occupied by white pieces in the target position that are empty in the source position, if this intersection has the same value as the differences variable then the echo filter will match the position.
Note that for any positions X and Y the query:
X & Yis equivalent to:
square sq in . {
X:colortype sq == Y:colortype sq
}String Filters
| Filter | Description | Example | Result |
|---|---|---|---|
+ |
String concatenation | "hello " + "world" |
"hello world" |
# |
String cardinality | #"hello" |
5 |
~~ |
Regex matching | "XABACA" ~~ "(A.)+" |
"ABAC" |
\n |
Regex group extraction | \1 |
"AC" |
\-n |
Regex group index | \-1 |
3 |
\{name} |
Named group extraction | \{year} |
"2024" |
\-{name} |
Named group index | \-{year} |
0 |
ascii |
Character-ordinal conversion | ascii "A"ascii 38 |
65"&" |
in |
Substring search | "ll" in "hello" |
true |
indexof |
Substring index | indexof("ll" "hello") |
2 |
int |
Numeric conversion | int("0123") |
123 |
lowercase |
Lowercases string | lowercase "Hello" |
"hello" |
replace |
Regex replacement | replace("abcd" ".c" "X") |
"aXd" |
str |
String conversion | str(1 false "abc") |
"1falseabc" |
uppercase |
Uppercases string | uppercase "Hello" |
"HELLO" |
The + binary infix string filter takes two string arguments and yields a string value that is the concatenation of its operands unless one of its operands does not match the position in which case neither does the + filter. The compound assignment filter += may also be used to append a string to a string variable, in general X += Y is much more efficient than X = X + Y and the latter should be avoided for long strings.
When the argument to the unary prefix # cardinality filter is a string, it yields the number of Unicode code points comprising the string (which may be different than the number of bytes or graphemes). The individual code points in a string may be accessed using String Slicing.
The ~~ binary infix string filter is a regular expression matching and extraction operator. The left-hand operand is a string filter and the right-hand operand is a string containing a regular expression pattern. The result is a string that corresponds to the first matching portion of the left-hand operand. Invalid regular expression patterns are diagnosed at query compilation time if the pattern is a string literal, otherwise such errors are diagnosed at runtime with a fatal error. If the left-hand operand does not match the position or there is no pattern match, the filter does not match the position.
By default, matching is case-sensitive and a match may occur anywhere in the string. See Regex Matching Flags for performing case-insensitive matches and Anchored Patterns to limit matches to the beginning or end of a string.
The \n group extraction filter yields the text of the nth capture group associated with the most recently evaluated regex matching filter. The \-n filter yields the starting index of the most recently evaluated regex matching filter relative to the beginning of the target match string. Named capture groups may be extracted using \{name} and \-{name} which yield the matched text and starting index, respectively, of the capture group with the specified name. If there was no previously evaluated regex matching filter or the most recently evaluated regex matching filter did not match or did not perform a capture corresponding to the provided group number or name, the result of these filters will be None. These filters will yield None after the final iteration of a regex iteration filter.
The ascii filter accepts either a String or a Numeric operand. When provided with a String operand, if the string consists of exactly one character (code point) and the binary value of the character is 127 or less, the value of the filter is this value, otherwise the filter yields None. When provided with a Numeric operand in the range of 0-127, the result is the ASCII character with the provided value.
The in binary infix string filter yields true if the value of the left-hand string operand appears anywhere in the right-hand string operand, otherwise the filter does not match the position.
The indexof filter takes a parenthesized argument list consisting of two String arguments. If the first string appears within the second string, the result is the index in the second string at which the first string appears, otherwise the filter does not match the position. If the first string appears multiple times in the second string, the result is the location of the earliest occurrence.
The uppercase and lowercase filters each accept a single string filter argument and yield the uppercased or lowercased string. Unicode-aware case conversion is performed by the uppercase and lowercase filters, e.g.:
uppercase "Criança"⇒"CRIANÇA"
uppercase "Strauß"⇒"STRAUSS"
lowercase "Æ"⇒"æ"
The int filter accepts a single string argument and attempts to extract an integral value from the string yielding the numeric result if successful. If no integral value could be extracted, the int filter does not match the position. The int filter first skips any initial whitespace and then looks for a sequence of decimal characters, optionally prefixed by a single plus or minus sign. The resulting value is converted to a numeric value. Non-decimal characters following a valid numeric sequence are ignored.
The replace filter has the form:
replace(subject-string pattern-string replacement-string [ count ])
and yields a copy of the subject-string with portions of the string that match the pattern-string replaced with the replacement-string.
If pattern-string is a string literal, it will be checked for syntax errors at parse time, otherwise regular expression syntax errors in the pattern string will be diagnosed at runtime with a fatal error. Patterns provided as string literals are also faster to evaluate as the regular expression only needs to be compiled once instead of every time the replace filter is evaluated.
In the replacement-string, the \uxxxx and \UXXXXXXXX sequences are expanded to their corresponding Unicode characters and named and numbered capture groups from pattern-string may be accessed using the syntax $# where # is the number of the captured group or ${name} where name is the name of a named capture group. Unmatched groups will be replaced with an empty string. Invalid capture group names or numbers following a $ will result in a fatal runtime error. A literal $ can be obtained by using \$. All other characters are taken literally. Note that the \n regex group extraction and the \-n regex group index filters that work with the ~~ filter do not interact with the replace filter in any way. The capture groups of a replace filter are accessible only within the pattern and replacement strings of the filter.
If count is not provided or has a value of 0, all instances that match the provided pattern-string are replaced. If count is greater than 0 then only the first count matches will be replaced. If count is negative, then only the last -count matches will be replaced. If the magnitude of count is greater than the number of matches in subject-string, all instances will be replaced.
If any of the arguments supplied to the replace filter does not match the position then the result of the replace filter will not match. Otherwise the replace filter will match the position even if no replacements occur (in which case the result is the value of the subject-string) or the result is an empty string.
The str filter accepts a parenthesized argument list consisting of one or more filters of any type. The str filter converts each of its arguments to a string value and yields the concatenated result. The str filter always matches the position. Values are converted to strings as described below. If str is used with a single filter the parentheses are optional.
String Portrayal of Types
Any value can be converted to its string representation using the str filter. String conversion also occurs for the arguments of the comment and message filters. The table below explains how each type is portrayed in its string representation.
| Type | Portrayal description | Examples |
|---|---|---|
| Numeric | Decimal digit representation with a leading minus sign for negative values. | 1234-34 |
| Set | A bracket enclosed, comma-separated list of squares in ascending rank-first order. | [d4,e4,d5,e5][] |
| Boolean | true or false |
true |
| Piece variable | Piece type character followed by the square on which it resides or [absent] if the piece is not present on the board in the current position. |
Ke2re8Pg5[absent] |
| Position | The string “move” followed by the move number and either “(wtm)” or “(btm)” indicating the side to move. If the position is not a mainline position, the position ID enclosed in square brackets is appended. | move 1(wtm)move 72(btm)move 4(wtm)[14] |
| Dictionary | The string “Dictionary with n entries” (where n is the number of entries in the dictionary) followed by a colon and newline (unless there are 0 entries) and each key/value pair in the dictionary separated by newlines. | Dictionary with 2 entries:a: 123b: 456Dictionary with 1 entry:x: abc |
String values are unchanged. A filter that yields None is portrayed as <None> regardless of the type of the filter.
Predefined Strings
The five backslash sequences are filters that always yield the String value indicated in the table below.
| Sequence | Value |
|---|---|
\n |
Newline character |
\r |
Carriage return |
\t |
Tab |
\" |
Double quote |
\\ |
Backslash |
Note that these are filters and only have a special meaning outside of string literals. For example:
X = "A" + \nwill assign the string consisting of A followed by the newline character to X but the query:
X = "A\n"will assign the string consisting of A followed by a backslash and the character n to X.
So, outside a string literal, \n is a predefined-string filter. Inside a string literal, the two characters \ and n are taken literally.
String Slicing
Substrings may be extracted with the [ … ] string slicing filter where … is a string index expression of the form i where i is an arbitrary numeric expression or the form m:n with m and n being arbitrary numeric expressions. In the latter form, either or both of m and n may be omitted. Indices are zero-indexed such that the first character (code point) of string x is represented by x[0].
There are two importantly different forms:
- Single-index form
x[i]yields theith character ofxifiis valid andNoneotherwise. - Slice form
x[m:n]yields a string, possibly empty. Out-of-range slice endpoints are clipped as described below; this form yieldsNoneonly ifxitself does not match the position.
The index of string x specified by a negative value j is #x + j, thus x[-1] represents that last character in x, x[-2] represents the next-to-last character in x, etc.
When the form x[m:n] is used, the result is the substring of x that starts at index m and ends at index n-1. If the index specified by m is greater than or equal to the index specified by n, or the index specified by m is not a valid index into the string x, the result is an empty string. Otherwise, if the index specified by n is greater than the largest valid index for x, the substring ends at the end of x. Likewise, if the index specified by m refers to a character before the start of x, the substring starts at the beginning of x.
The result of string slicing is therefore:
- the extracted string (which may be empty) for the slice form
x[m:n], or Nonefor the single-index formx[i]ifidoes not specify an existing index,
and in either case None if x itself does not match the position.
| Example | Result |
|---|---|
"abcde"[4] |
"e" |
"abcde"[5] |
None |
"abcde"[-5] |
"a" |
"abcde"[-6] |
None |
"abcde"[1:1] |
"" |
"abcde"[1:2] |
"b" |
"abcde"[1:] |
"bcde" |
"abcde"[:3] |
"abc" |
"abcde"[-4:-1] |
"bcd" |
"abcde"[-4:100] |
"bcde" |
"abcde"[-10:10] |
"abcde" |
"abcde"[10:20] |
"" |
"abcde"[:] |
"abcde" |
Note in particular that the first n characters of string x are obtained with x[:n] and the last n characters of string x are obtained with x[-n:].
Slicing Assignment
If the left-hand side of a string slicing operation is a variable, the slice may be assigned using the syntax x[ … ]= String where String is an arbitrary string filter. The portion of the substring referenced by the string index expression is replaced with String. String may be a different size than the referenced substring in which case the string value of x is modified to accommodate the replacement. The result of the slicing assignment filter has Boolean type and matches the position unless x is unbound, String does not match the position (i.e. it is None), or the left-hand side of the assignment has the form x[i] and i specifies an index that does not exist in x. If the slicing assignment does not match the position, x is not modified although the reverse is not necessarily true, e.g. given a variable x of length 5, the filter x[10:] = "abc" will match the position even though x is not modified.
As with slicing itself, single-index and range forms behave differently here:
x[i] = ...fails ifiis not a valid existing index.x[m:n] = ...may succeed even when the replacement changes nothing, for example because the referenced slice is empty and the replacement is also empty, or because the slice lies entirely beyond the end of the string.
| Example | Value of x |
Expression Result |
|---|---|---|
x = "abc" x[0] = "" |
"bc" |
true |
x = "abc" x[1] = "xxx" |
"axxxc" |
true |
x = "abc" x[1:] = "" |
"a" |
true |
x = "abc" x[:-2] = "" |
"bc" |
true |
x = "abc" x[0:0] = "x" |
"xabc" |
true |
x = "abc" x[5] = "x" |
"abc" |
false |
Code Points and Graphemes
The values returned by the indexof, # string cardinality, and \-n group index filters, and the values used in the string index expression of slicing filters, represent code point indices into the corresponding strings, e.g. X[3] represents the fourth code point in X regardless of how many bytes or code units are required to represent the string X (CQLi does not expose access to the internal representation of Unicode code points). To iterate over the code points in a string, the following query may be used:
$idx = 0
while ($idx < #X) {
$codepoint = X[$idx]
$idx += 1
}Code points may also be iterated with the regex iteration filter which is more concise but somewhat less efficient:
while (X ~~ ".") {
$codepoint = \0
}To iterate over the extended grapheme clusters of a string, replace "." with "\X":
while (X ~~ "\X") {
$codepoint = \0
}The starting code point offset of every grapheme cluster can be accessed using \-0 and the length of the grapheme, in code point units, obtained with #\0. For example, the query:
while (X ~~ "\X") {
message("Grapheme of length " #\0 " that starts at position " \-0 ": " \0)
}will emit a message containing the length and starting position of every grapheme cluster in X.
String Limitations
Strings that require more than one billion UTF-16 code units to represent are not supported.
Regular Expression Matching
The ~~ and replace filters utilize regular expressions. This section provides an overview of some of the fundamental regular expression capabilities supported by the patterns that these filters accept. For more information see Regular-Expressions.info which is a great resource detailing the features provided by various regular expression implementations, including that employed by CQLi (ICU version 78 or later).
Regex Syntax Fundamentals
Regular expressions provide a powerful mechanism to search for patterns within text using facilities such as repetition, alternation, and character classes.
A regular expression consists of characters that represent themselves (including letters and digits) and characters with a special meaning (*, ?, +, {, }, (, ), [, ^, $, |, \, and .). The backslash is used to escape special characters (to cause them to represent themselves) and to give special meaning when preceding certain characters that are not normally special.
Repetition
Repetition operators allow part of a pattern to optionally match or to match multiple times. The * operator specifies that the previous character is present zero or more times, + matches the previous character one or more times, and ? matches zero or once. For example, in the query:
var ~~ "\d+:\d+"\d represents any digit and + indicates one or more of what immediately preceded it so \d+ represents one or more digits. The : matches itself so the pattern \d+:\d+ will match if var contains a sequence consisting of one or more digits followed by a colon and one or more digits immediately following the colon, e.g. "Time 1:23" will match the pattern (with the result being "1:23") but "123:" and ":123" will not.
The basic repetition operators (*, +, ?, and {...}) will match as much of the string as possible (i.e. they are greedy matching operators) without causing a match failure (i.e. they are non-possessive). For example:
"ABBB" ~~ "AB*"will match the entire string and:
"ABBBCABD" ~~ "AB*D"will match "ABD". The first successful match is always returned regardless of whether a later match would consume a larger portion of the string, e.g.:
"ABABBABBBB" ~~ "AB+"will match the initial sequence "AB", not a later (and longer) sequence. Repetition operators may be made non-greedy (matching as little as possible) by suffixing them with ?. For example:
"ABBB" ~~ "AB+?"will match "AB" since the expression B+? requires at least one B and prefers to match the smallest sequence. Non-greedy repetition is useful when trying to match delimited text. For example:
"#A# #B# #C#" ~~ "#.*#"will match the # character followed by any number of any character (. represents any non-newline character) followed by another #. To extract only the first delimited portion (#A#) the non-greedy version of * may be used:
"#A# #B# #C#" ~~ "#.*?#"The basic greedy and non-greedy repetition operators are summarized in the table below.
| Operator | Description |
|---|---|
? |
Matches zero or one times, prefers to match once. |
* |
Matches zero or more times, matches as much as possible. |
+ |
Matches one or more times, matches as much as possible. |
{n} |
Matches exactly n times. |
{n,m} |
Matches between n and m times, matching as many times as possible. |
{n,} |
Matches n or more times, matching as many times as possible. |
?? |
Matches zero or one times, prefers to match zero times. |
*? |
Matches zero or more times, matches as little as possible. |
+? |
Matches one or more times, matches as little as possible. |
{n,m}? |
Matches between n and m times, matching as few times as possible. |
{n,}? |
Matches n or more times, matching as few times as possible. |
Alternation
The | character is the alternation operator, A|B will match either A or B.
Character Classes
A character class matches any character from the specified square bracket-enclosed set. For example, the regex ch[aio]p will match chap, chip, or chop. Ranges may be created by separating two characters by a dash, e.g. [A-Z] will match any character with a Unicode code point value between (inclusive) the values used to represent A and Z. A negated character class can be specified by using ^ as the first character in the class in which case the class matches anything except the contained values, e.g. [^A-Z] will match any character except A through Z.
Character classes may be nested, e.g. [[A-Z][a-z]] is equivalent to [A-Za-z]. The && and -- class operators may be used to perform set intersection and set subtraction, respectively, to form the resulting class. For example, [\p{S}--\p{Sm}] will match a non-math symbol character and [\p{L}&&\p{script=Cyrl}] will match any Cyrillic letter. Any of the Escape Sequences except for \A, \b, \B, \R, \X, \z, \Z, and backreferences may be used in a character class. The POSIX character classes are also supported, e.g. [[:ascii:]] will match any ASCII character.
Groups, Captures, and Backreferences
Parentheses are used to form a group which is treated as a unit, repetition operators following such a group apply to the entire text matching the group. For example the pattern (\d+:)+ will match a sequence of one or more groups of text, each containing one or more digits followed by a colon.
By default, groups perform captures meaning that their corresponding matching text may be referenced later in the pattern. Text matching a captured group is accessible using backreferences which consist of a backslash followed by an index i (starting at 1) that represents the ith capture group appearing in the pattern. For example, the pattern \d\d\d will match any three digits but the pattern (\d)\1\1 will only match three identical digits (e.g. 111, 222, etc).
Backreferences may also appear outside of regular expression patterns to extract text matching capture groups from the most recently evaluated ~~ filter. Additionally, \0 may be used outside of a pattern and yields the entire matched text (which is also the result of the ~~ filter). For example, the following query will match the first appearance of a character appearing three or more times in a row with the repeated character available after the match as \1:
"ABCCDEEEEF" ~~ "(.)\1{2,}"
\0 == "EEEE"
\1 == "E"Named capture groups may be extracted outside of patterns using \{name} which yields the matched text of the named group, and \-{name} which yields its starting index. For example:
"2024-01-15" ~~ "(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})"
\{year} == "2024"
\{month} == "01"
\{day} == "15"
\-{year} == 0
\-{month} == 5Named capture group names must start with a letter and contain only ASCII letters and digits.
Groups may be nested to an arbitrary depth.
In addition to the basic grouping/capturing parentheses, there are several special parenthetical constructs that introduce various behaviors. These are summarized in the table below.
| Syntax | Description |
|---|---|
(...) |
Capturing parentheses. The portion of string that matches ... will be available via a backreference. |
(?<name>...) |
Named capturing parentheses. The portion of the string that matches ... will be available both using a numeric backreference and as a named group within the pattern using \k<name>. Outside of the pattern, the matched text and starting index can be extracted using \{name} and \-{name} respectively. |
(?: ...) |
Non-capturing parentheses. Used to group an expression without capturing the contents of the matching portion. |
(?= ...) |
Positive lookahead assertion. The ... portion must match at the current position being matched but the matching portion is not consumed. |
(?! ...) |
Negative lookahead assertion. The ... portion must not match at the current position being matched. |
(?<= ...) |
Positive lookbehind assertion. The ... portion must match the part of the text that immediately precedes the current position. The ... portion may not contain the unbounded repetition (e.g. no + or * operators). |
(?<! ...) |
Negative lookbehind assertion. The ... portion must not match the part of the text that immediately precedes the current position. The ... portion may not contain unbounded repetition (e.g. no + or * operators). |
(?> ...) |
Atomic capturing parentheses. The ... portion is matched possessively. |
(?# ...) |
Comment parentheses. The entire parenthetical construct is ignored. |
For example, the patterns (fl|spl)at, (?<prefix>fl|spl)at, and (?:fl|spl)at will all match the same text, the difference being that fl/spl matching prefix will be available with the \1 backreference after matching either of the first two cases and additionally available via the named backreference \k<prefix> later in the same pattern, or via \{prefix} outside of the pattern, in the second case.
Lookahead and lookbehind assertions require specific text to be present at a particular point in a match in order to continue. Use cases for these constructs, as well as possessive matching, are more advanced and outside the scope of this introduction.
Escape Sequences
The backslash \ character is used to escape regex meta characters in patterns and access backreference content of previously matched capture groups. The backslash may also be used to start one of several escape sequences as described in the table below.
| Sequence | Description |
|---|---|
\a |
Matches the BELL character, i.e. \u0007. |
\A |
Matches the beginning of a string. Unlike ^, will not match after a newline. |
\b |
Matches at a word boundary. |
\B |
Matches when the current position is not at a word boundary. |
\cX |
Matches a control-X character where X is in the range A-Z. |
\d |
Matches any decimal digit character (Unicode General Category Nd). |
\D |
Matches any non-decimal digit character. |
\e |
Matches the ESCAPE character, i.e. \u001B. |
\E |
Marks the end of the most recent quoting sequence begun with \Q. |
\f |
Matches a FORM FEED character, i.e. \u000C. |
\h |
Matches a horizontal whitespace character, i.e. HORIZONTAL TABULATION (\u0009) or Unicode General Category Zs. |
\H |
Matches a non-horizontal whitespace character. |
\k<name> |
Named capture backreference. |
\n |
Matches a LINEFEED character, i.e. \u000A. |
\N{NAME} |
Matches a code point with the specified character name, e.g. \N{Latin Capital letter C with cedilla} will match the character Ç (\u00C7). |
\p{NAME} |
Matches a Unicode code point with the specified property name, e.g. \p{Lt} will match a titlecase letter. |
\P{NAME} |
Matches a character that does not have the specified property name. |
\Q |
Quotes characters between the \Q and the next \E sequence, e.g. \Q()\E will match the literal text () (instead of treating the parentheses as a group). |
\r |
Matches a CARRIAGE RETURN character, i.e. \u000D. |
\R |
Matches the sequence CARRIAGE RETURN + LINEFEED or a newline character (one of \u000A, \u000B, \u000C, \u000D, \u0085, \u2028, or \u2029). |
\s |
Matches a whitespace character (equivalent to the character class [\t\n\f\r\p{Z}]). |
\S |
Matches a non-whitespace character. |
\t |
Matches a HORIZONTAL TABULATION character, i.e. \u0009. |
\uhhhh |
Matches the Unicode code point with the provided 4-digit hexadecimal value. |
\Uhhhhhhhh |
Matches the Unicode code point with the provided 8-digit hexadecimal value. |
\v |
Matches a newline character, i.e. \u000A, \u000B, \u000C, \u000D, \u0085, \u2028, or \u2029. |
\V |
Matches a non-newline character. |
\w |
Matches a word character, equivalent to [\p{L}\p{Nl}\p{M}\p{Nd}\p{Pc}\u200c\u200d]. |
\W |
Matches a non-word character. |
\xhh |
Matches the code point with the provided two-digit hexadecimal value. |
\x{hhhhhh} |
Matches the code point with the provided 1-6 digit hexadecimal value. |
\X |
Matches a Grapheme Cluster which may consist of multiple code points. |
\z |
Matches the end of the string. |
\Z |
Matches the end of the string or if \R$ would match at the current position. |
Anchored Patterns
Patterns are not anchored by default meaning that the matching substring may occur in any part of the string, the special ^ and $ characters may be used to match the beginning or end of the string, respectively. For example:
var ~~ "^\d+"will match one or more digits appearing at the start of the string.
Finding all Matches
When the condition of the while filter is a regular expression matching filter, the filter becomes a regex iteration filter. In this case, the LHS of the ~~ operator will be evaluated once after which the pattern provided on the RHS will be applied to the LHS argument until it no longer matches with the body of the while filter being evaluated after each match. For example:
while ("ABC" ~~ ".") {
message \0
}will print A on the first iteration, B on the second, and C on the third and final iteration. Note that the pattern provided as the RHS of the ~~ operator will only be processed once, even if the pattern is a string variable whose value changes after iteration begins. The result of this form of the while filter matches the position unless the LHS argument does not, even if the pattern never matches.
Regex Matching Flags
There are several flags that may be embedded in a regular expression to change the default matching behavior in different ways. The table below describes these flags.
| Flag | Default | Description |
|---|---|---|
i |
OFF | If set, matching occurs in a case-insensitive manner. |
m |
ON | If set, the ^ and $ anchors will match the beginning and end of a line, respectively, in addition to matching the beginning or end of a string. |
s |
OFF | If set, the . character will match a line terminator (e.g. line-feed, vertical tab, form-feed, carriage-return, or a carriage-return line-feed sequence). |
w |
OFF | If set, the \b sequence matches word boundaries in accordance with Unicode UAX 29 which employs a much more sophisticated, and slower, locale-dependent behavior than the simple word/non-word character classification employed when this flag is not set. |
x |
OFF | If set, whitespace in regular expressions does not have any special meaning (use \s to match whitespace instead) and everything from a # character to the end of a line is ignored by the regex parser. Using this flag facilitates commenting complex regular expressions that may also span multiple lines. |
The values of the above flags may be set within a regular expression using the syntax:
(?{imswx}[-]{imswx})
which will set the flag values until the next flag setting expression appears or the end of the pattern is reached. The flags may also be modified just for a sub-pattern using the syntax:
(?{imswx}[-]{imswx}:…)
where … is the pattern that should be subject to the flag modifications. Flags appearing without a preceding - are turned ON, flags appearing after a - are turned OFF.
For example, the following query performs a case-insensitive search for “Michael Jones”:
$name ~~ "(?i)Michael Jones"
The following query performs a case-insensitive search for “Michael” followed by a case-sensitive search for “Jones”:
$name ~~ "(?i)Michael (?-i)Jones"
The same effect could be realized by using the second form above to limit the flag modification to a single sub-pattern:
$name ~~ "(?i:Michael) Jones"
The following pattern shows how multiple flags may be enabled and disabled at once:
(?is-mw)
which will turn the i and s flags ON and turn the m and w flags OFF.
Ranges
Several filters accept an optional range argument. A range consists of one or two numeric filters, each of which must be either a numeric variable or a numeric constant. The first filter in a range may be the operand of a negation operator. When a single range constituent is provided the resulting range represents a single value, otherwise the range represents the set of values between (and including) the supplied endpoints. Potential range elements must not be parsable as part of a larger expression, e.g. find 1 10 -3 will be parsed as a find filter with a range of [1:1] and a body of 10-3, not as a range of [1:10] with a body of -3.
The filters accepting a range argument are: consecutivemoves, find, line, and the direction and transform filters (use of ranges in transform filters is deprecated). In the very unusual situation where the body of one of these filters might unintentionally be interpreted as a range, parentheses or braces can be used around the body to prevent it from being parsed as a range.
Ranges were used extensively in CQL 5 which did not possess arithmetic comparison operators and instead relied on ranges to specify target values for many filters. For example, to find positions where White is attacking 50 or more squares in CQL 5, the query attack 50 64 (A .) was used. In CQL 6, the equivalent query is . attackedby A >= 50. CQL 6 retains ranges in several places where the same functionality would not be easily expressed without ranges but they play a much smaller role than they did in CQL 5.
Identity Operator
The identity operator is represented with the backtick character (`). The identity operator is a prefix filter taking a single filter operand of any type. The type and value of the identity operator are those of its operand.
The identity operator never changes the type or value produced by its operand. Instead, it changes how the operand is interpreted in context. The identity filter has the following contextual effects:
- When used to prefix a variable that is an argument in a function call, it suppresses pass-by-ref semantics.
For example, after the following query is executed,
$var1will be set to0but$var2will retain its value:$var1 = 10 $var2 = 20 function clear($param) { $param = 0 } clear($var1) clear(`$var2)
- It forces the normally suppressed PieceID-to-Set conversion in a function call or within a
messageorcommentfilter.
For example, in the starting position, the first
messagefilter below will printKe1while the secondmessagefilter will printe1:piece King = K message(K) message(`K)
- It forces evaluation of arguments in a
messageorcommentfilter that would normally emit a string.
For example, the filter
message(event "X")will print the value of theEventtag followed by the stringXbutmessage(`event "X")will print the Boolean value of theevent "X"filter (trueif theEventtag contains the string"X"andfalseotherwise).
- When appearing before the key expression in a dictionary access filter, it will prevent the square brackets from being treated as a piece designator.
For example, if
Dxis a dictionary,Dx[a2]will be interpreted as the two filtersDxand[a2]. UsingDx[`a2]will instead cause the filter to be treated as a dictionary access filter with a Set key.