Posts in this category

Mon, 16 Aug 2010

The State of Regex Modifiers in Rakudo

Permanent link

During the last one and a half month, I've been working on making regex modifiers easily available in Rakudo.

The regex compiler itself has to support only a few of the adverbs that can be applied to regexes; those include :ignorecase, :sigspace, :ignoremark and :continue/:pos. NQP-rx, the regex engine that Rakudo uses under the hood, supports those (except :ignoremark), so previously you could write

if 'ABC' ~~ /:i abc/ {
    say "case insensitive match";

But not

if 'ABC' ~~ rx:i/abc/ {
    say "case insensitive match";

nor m:i/abc/, for that matter.

I've patched Rakudo to actually recognize those adverbs outside of the regex, and also for s/// substitutions.

Another category of adverbs are those that apply to regex calls, not to the compilation of a regex. Among those are :global/:g, :overlap/:ov, :nth($n), :x. I've implemented those for substitutions, but implementing them for m// turns out to be quite a bit harder.

The reason is the return value: each regex match returns a Match object, which can store positional and named parts. S05 says that regex matches with multiple results should return a single match object, with all results as positional parts. It can be distinguished from a normal match object by evaluating it in slice context... which Rakudo doesn't support yet.

Now the subst method and thus s/// are implemented by calling .match(:global, ...), and without slice context, it can't distinguish between multiple matches, and a single match with subcaptures. And so my changes to the global match broke the substitution, and I see no easy way to fix it.

Anyway, here are a few examples of what works today:

$_ = 'ab12fg34';
.say; # output: abXXfgXX

$_ = 'Hello, World';
# :ii is the same as :samecase
.say; # output: Hello, Perl

$_ = 'I did not know that that work together';
.say; # output: I did not know that they work together

[/perl-6] Permanent link

comments / trackbacks