Documentation

The line Filter

Match move sequences with the line filter, including parameter semantics, linearization behavior, and atomic evaluation.

The line filter is used to search for a sequential series of positions, starting from the current position, that match a prescribed pattern. A pattern consists of one or more constituents that describe when a position matches. The ability to use pattern repetition on individual constituents, borrowed from regular expressions, makes the line filter one of the most powerful filters in the CQL language which in turn lends itself to a variety of applications including Longest Consecutive Sequences.

Description

A line filter consists of zero or more parameters and an optional range followed by one or more constituent filters, each of which are introduced with the <-- or the --> token. Every constituent within the same line filter must be introduced by the same token, i.e. <-- and --> may not be mixed within a line filter. The line filter will then evaluate each constituent filter within consecutive positions existing in a single line, starting with the current position. If every constituent filter matches a corresponding position the specified number of times, the line filter matches and yields the length of the longest matching line. Consecutive positions are immediate child positions of the current position when --> is used (forward looking sequence) and parent positions when <-- is used (backwards looking sequence).

For example, consider the following query:

line --> check
     --> move previous capture (flipcolor A attacks k)
     --> mate

which will match when the current position is check, the check is resolved by capturing a piece attacking the king, and the subsequent move results in mate.

Constituent Repetition

Constituents of a line filter may optionally be followed by a quantifier which modifies the number of times the constituent must match (by default each constituent must match once). The quantifiers that may be used in line constituents are shown in the table below.

Quantifier Meaning
? or {?} Match zero or once.
* or {*} Match zero or more times.
+ or {+} Match one or more times.
{n} Match exactly n times.
{,n} Match up to n times.
{n,} Match n or more times.
{m,n} or {m n} Match between m and n times.

For example, to find sequences that start with a check, followed by any number of captures (including zero), followed by mate, the * quantifier may be added to the move filter constituent:

line --> check
     --> move previous capture . *
     --> mate

Constituent quantifiers are always greedy meaning they match as many times as possible, but not at the expense of the line filter failing. For example, the query:

line --> move previous capture . +

will match the longest sequence of captures starting at the current position and the query:

line --> move previous capture . + --> mate

will match a series of captures followed by mate even if the move that results in mate is a capture (the move constituent will not consume the final position, allowing the mate constituent to match instead so that the line filter can match).

If a * or + token appearing in a constituent could be interpreted as multiplication or addition, it will be. To force the token to be interpreted as a repetition quantifier it may be enclosed in braces, e.g. {?}, {*}, and {+}. In addition to being unambiguous, enclosing repetition quantifiers in braces help them stand out visually.

The m and n appearing in counted repetition quantifiers must be non-negative Numeric literals and m, if present, must not be greater than n.

Constituent Grouping

Multiple line constituents may be grouped by enclosing the constituents in parentheses and repetition may then be applied to the group as a whole. For example, the following query will find games that end in checkmate with at least three check-then-capture sequences immediately preceding mate:

line --> (check --> move previous capture (flipcolor A attacks k)) {3,}
     --> mate

Constituent groups may be nested and separate repetition for constituents within a group may be specified. Note that each instance of <-- or --> represents the next position in the specified direction, even within a grouping, e.g. the above query will not match sequences of less than seven positions.

An example of a game matching the above query is here. The diagrams below show the critical positions.

Position before 41...hxg4+
Position before 41…hxg4+
Position after 41...hxg4+ 42.fxg4 Bxg4+ 43.Kxg4 Rxg2+ 44.Rxg2 Qxg2#
Position after 41…hxg4+ 42.fxg4 Bxg4+ 43.Kxg4 Rxg2+ 44.Rxg2 Qxg2#

Auxiliary Comments

Unless the quiet parameter is specified, the longest sequence found by a matching line filter at the current position will be annotated with auxiliary comments. The matching position at the start of this sequence will be annotated with the comment:

Start line that ends at move end-position

and the position that ends the sequence will be annotated with the comment:

End line of length sequence-length that starts at move start-position

where sequence-length is a positive non-zero numeric value that represents the length of the annotated sequence and start-position and end-position are the textual portrayals (as described in String Portrayal of Types) of the positions that start and end the sequence, respectively.

Multiple Matching Sequences

When variations are processed, it is possible for multiple sequences of the same length to be found at the current position. When this happens, the selected sequence (the one which contains auxiliary comments, comments retained by Smart Comments, the final position returned when using the lastposition parameter, and the sequence that determines the the final state of modified variables) is the one that ends with the position with the lowest position ID. If the –keepallbest option is used, comments (both user-comments and auxiliary comments) are preserved for all matching sequences but the position returned when lastposition is used and the final state of modified variables is the same as when this option is not used.

When multiple constituents in a line filter contain repetition quantifiers, it is possible for a matching sequence to have several ways to match, even outside of variations. For example, consider the query:

initial
line --> comment "A" * --> comment "B" *

applied at the beginning of a game that contains two positions. Since repetition is greedy, we know that both positions will be consumed by this line filter but how many times will each constituent match? Since the * quantifier can successfully match zero times, the possible outcomes are that the first constituent matches zero times and the second one matches two times, both constituents match once, or the first constituent matches both times. In most cases it won’t make a difference but it can when the constituents have side-effects, such as modifying a variable or adding a comment. When the line filter has multiples ways to match a sequence of the same length, it is unspecified how the positions in the sequence will be correlated to the corresponding constituents. In other words, the result of the above query could be that both positions are commented with “A”, both are commented with “B”, or the first position is commented with “A” and the second with “B”.

line Filter Parameters

The table below lists the parameters that may be used with the line filter.

Parameter Effect
firstmatch Find the shortest matching sequence instead of the longest.
lastposition Yield the last matching position instead of the length of the sequence.
nestban Prevent matching positions from starting a later sequence.
nolinearize Disable move linearization.
nonatomic Disable atomic evaluation.
primary Do not consider positions that start a variation.
quiet Do not emit auxiliary comments.
secondary After the first position, only consider positions that start a variation.
singlecolor Only consider positions with the same side-to-move.

The firstmatch Parameter

When the firstmatch parameter is specified, CQLi will find the shortest sequence that matches the constituent filters instead of the longest. Despite the name, this is not necessarily the “first” successful match that is encountered. This parameter was used in CQL 6 to partially mitigate situations in which variables modified in line constituents could have inconsistent states when backtracking while looking for a longer match although it doesn’t always achieve that goal. CQLi employs a more robust mechanism (Atomic Evaluation described below) to ensure variable consistency while evaluating constituents so specifying firstmatch is not necessary to accommodate such filters. Additionally, CQLi utilizes a different, non-backtracking, matching algorithm and the behavior that is closest to the CQL 6 behavior is simply to yield the shortest matching sequence.

The lastposition Parameter

The lastposition parameter indicates that the last position of the best matching sequence found by line should be returned instead of the length of the longest sequence.

The nestban Parameter

When nestban is used, all of the positions in the sequence of a matching line filter will be banned from starting a sequence in a later evaluation of the same line filter for the same game. The nestban parameter may not be used with backwards-looking line filters (ones where the constituent introducer is <--).

The nestban parameter is used to prevent subsequences of the longest matching sequence from being reported. For example, the following query will find sequences of 5 or more consecutive captures:

line nestban --> move capture . {5,}

Without the nestban parameter, all later subsequences of the initial sequence will also be found which is typically not desired.

The nolinearize Parameter

The nolinearize parameter disables Move Linearization within all constituents of the line filter. Linearization may be independently disabled for individual constituents as described below.

The nonatomic Parameter

By default, CQLi performs atomic evaluation of constituent filters that modify non-dictionary variables. The nonatomic parameter may be used to suppress this atomic evaluation.

The primary and secondary Parameters

If secondary is specified, only positions that start a variation are considered after the current position. If primary is specified, only positions that do not start a variation are considered after the current position.

The quiet Parameter

If the quiet parameter is provided, auxiliary comments normally added by the line filter will be suppressed for matching sequences.

The singlecolor Parameter

When singlecolor is specified, only positions with the same side to move as the position that starts the sequence are considered. For example, the query;

line singlecolor --> check {10,}

will find sequences where one side checks the other 10 or more times in a row.

Move Linearization

Move linearization refers to the special handling of ordinary move filters (those that do not specify previous, legal, or pseudolegal) when appearing in the constituent of a line filter. In particular, when there are multiple moves recorded at the position being evaluated for a line filter constituent (this situation only occurs for games that have variations), the result of the move filter is as though only the move that leads to the next position in the line currently being processed by the line filter exists. For example, in the game:

e4 (d4 d5) e5 *

there are two recorded lines, the primary line e4 e5 and the variation d4 d5. If move linearization did not occur, then the below query would match the above game:

initial
line --> move to e4 --> move to d5

At the initial position, there are two moves e4 and d4 so, without move linearization, the first constituent would always match at the initial position. The second constituent would then match at the position that follows 1.d4 (the next move in this position is d5). Since there is no line in which e4 d5 was played, the result would be unhelpful at best. To solve this problem, an ordinary move filter evaluated in a line filter constituent will only consider the move that was played next in the line that is currently being processed by the line filter. At the initial position, the line filter will process each of the two lines individually and only the main line will match the first constituent.

Move linearization is suppressed within line filter constituents in the following situations:

  • When the nolinearize parameter is specified for the corresponding line filter. In a nested line filter, the presence of nolinearize in the outer line filter does not suppress linearization for constituents in an inner line filter.
  • Within the body of a with-position filter.
  • Within the body of a find or echo filter.
  • When the current position in which the move filter is evaluated is different from the current position in which the line filter is evaluated.

Atomic Evaluation

Modification of non-dictionary variables inside of line filter constituents are performed atomically with respect to the candidate sequence being evaluated unless the nonatomic parameter is specified.

Variable modifications include (simple, compound, and slicing) assignment and variable disassociation via the unbind filter. Atomic evaluation ensures a consistent and isolated variable state during evaluation of any particular candidate sequence and well-defined values for modified variables at the end of evaluation of the line filter.

For example, the below query will find games with a sequence of eight or more consecutive captures:

sort "Consecutive captures"
line nestban --> move capture . {+} >= 8

This query can be modified to obtain the set of squares on which these captures occurred, e.g.:

$capture_squares = []
sort "Consecutive captures"
line nestban --> {
    $result =? move capture .
    $capture_squares |= $result } {+} >= 8

The $capture_squares variable is used to store the squares on which captures occur during the evaluation of the line filter. Atomic evaluation ensures that $capture_squares always represents only the captures seen in the current candidate sequence, even for games that contain multiple variations that would cause $capture_squares to be modified. In particular, the captures that occur in one variation will not affect the value of the $capture_squares variable while a different variation is being processed. At the end of the query, the value of $capture_squares will hold the set of squares on which captures occurred during the longest matching sequence identified by the line filter. Atomic evaluation supports arbitrary variable modifications within nested line filters.

In the below game from the HHdbVI endgame database:

[Event "Europa Rochade#0370"]
[Site "?"]
[Date "1985.??.??"]
[Round "?"]
[White "Jahn=G Geisdorf=H"]
[Black "(=0374.32d5e8)"]
[Result "1/2-1/2"]
[SetUp "1"]
[FEN "2n1kN2/2rb2p1/1PB2b1p/2PKP3/8/8/8/8 w - - 0 1"]
[PlyCount "14"]
[EventDate "1985.??.??"]

{Europa Rochade=10 Europa Rochade/10.} 1.Bxd7+ $1 (1.Nxd7 $2 1...Ne7+ $1 2.Ke4
Rxd7 $1 3.Bb5 Bxe5 4.Kxe5 Kd8 5.c6 Nxc6+ $1 6.Bxc6 Rf7) (1.exf6 $2 1...Bxc6+ $1
2.Ke6 Kxf8 3.bxc7 g5 $1) 1...Rxd7+ $1 (1...Kxf8 2.bxc7 $1) 2.Nxd7 Kxd7 3.c6+ $1
(3.exf6 $2 3...Nxb6+ 4.cxb6 gxf6 5.b7 Kc7 6.Ke6 h5 $1 7.Kf5 (7.Kxf6 h4 $1) 7...
Kxb7) 3...Ke8 $1 (3...Kd8 4.c7+ $1 4...Kd7 (4...Ke8 5.b7 $1) 5.e6+ Ke7 6.b7 $1)
4.b7 $1 4...Ne7+ 5.Ke6 $1 (5.Kd6 $2 5...Bxe5+ $1 6.Kxe5 Nxc6+) 5...Bxe5 (5...
Nxc6 {<main>} 6.exf6 $1 6...g5 (6...Nd8+ 7.Kf5 $1 7...Nxb7 8.fxg7 $1 8...Kf7 9.
g8=Q+ Kxg8 10.Kg6) 7.f7+ $1 7...Kf8 8.Kd6 Nb8 9.Kc7 Na6+ 10.Kb6 $1) 6.c7 $1 (6.
Kxe5 $2 6...Nxc6+ $1) 6...Bxc7 7.b8=Q+ Bxb8 1/2-1/2

the line having the longest series of consecutive captures is:

1.Bxd7+ Rxd7+ 2.Nxd7 Kxd7 3.exf6 Nxb6+ 4.cxb6 gxf6

for which $capture_squares will have the value [b6,f6,d7]. If atomic evaluation is suppressed with the nonatomic keyword, the result of $capture_squares after this sequence is matched will instead be [b6,c6,f6,c7,d7,f8] which includes captures that occurred in other variations processed by the line filter in the same position.