Categories

Posts in this category

Sat, 09 Oct 2010

Longest Palindrome by Regex


Permanent link

A few hours ago colomon blogged a Perl 6 solution to the longest palindrome programming challenge. It was a bit slow.

An all-regex solution looks like this:

for $string.match(:g, /(\w+) \w? <{ $0.flip }> /) {
    # process palindrome match here
    # filter out the longest match
}

It's also very slow, and it's very wasteful. So, can there be a hybrid between manual search a regex?

The regex engine is quite fast, and using it to find the center of a palindrome is a good starting point. From there on we can can move to both sides, and compare character by character if moving away from the match still results in a palindrome:

my $s = 'Fourscoreandsevenyearsagoourfaathers...';

my $solution;
my $longest = 0;
for $s.match(:g, /(\w)\w?$0/) {
    # $_ now holds a Match object,
    my $left = .from - 1;
    my $right = .to;

    # go to left and right while palindromic
    while $s.substr($left, 1) eq $s.substr($right, 1) {
        $left--;
        $right++;
    }

    # we move a bit too far
    $left++;
    $right;

    # and we're only interested in the longest
    # palindromic substring
    my $len = $right - $left;
    if $len > $longest {
        $solution = $s.substr($left, $right - $left);
        $longest = $len;
    }
}
say $solution;

This now runs in under 1.5 seconds on the 1169 characters long example input string.

[/perl-6] Permanent link

Comments / Trackbacks:

Trackback URL: /blog-en/perl-6/longest-palindrome-regex.trackback

wrote


There are typos in that input string.

Tim Bollman wrote

Slight Problems
Due to how the regex engine works, your script has minor problems with finding the center of things that can be considered a palindrome with many centers. For example, use 'abbbbc'. Your script will use 'bbb' as the longest palindrome because of two regex engine particulars. The first is that your regex treats the second part as part of the match, so the next match will start at the end of the current one. Thus your program will try only 'a' 'bbb' 'bc'. When it next tries to match, the remaining string is just 'bc' which doesn't match. In perl5, you could update this using the pos function, I don't know how to do so in perl6. The second is that the perl regex will only give you back one of the two posibilities for a given position if there are two. In this case, it's always the three character version because it's longer than two. This should be more accurate although it doesn't get to use the regex engine.

for 0 .. $s.chars - 2 -> $left
{
if $s.substr($left, 1) eq $s.substr($left + 1, 1) {
my $right = $left + 1;
# do palindrome match
}
if $s.substr($left, 1) eq $s.substr($left + 2, 1) {
my $right = $left + 2;
# do palindrome match
}
}

Klausens wrote


I also wrote a regexp-less version, mainly because i felt 1,5sek a very long time for parsing over a quite short text.

After my script returned the right password I still found several bugs in my code.
The text in the challenge is choosen quite unlucky in my opinion if several buggy scripts anyway return the right result.

Tim Bollman wrote

@Klausens
The 1.5 seconds isn't a statement on the speed of regular expressions, it's a statement on the current speed of Rakudo (not that this should dissuade you from trying perl6, it's awesome). If anything, the regular expression would probably be very slightly faster than the regexp-less version in perl because the regexp given to it should be pretty easy to optimize and presumably the number of potential palindromes is low compared to the number of characters in the string. For comparison, the literal translation of this code into perl5 runs in under a millisecond.

At least for the bugs I gave here, the reason they weren't uncovered in the challenge was because they are very hard to trigger. The challenge creators would have had to create the text specifically to trigger on this if they wanted to catch regexp solutions, because not only does it require the behavior to trigger, it requires it to trigger on the longest palindrome.

Tim Bollman wrote

Regex Improvements
You should be able to change the regex to /(\w)\w$0 | (\w)$0+/ and have it work in all cases (assuming the match re-position issue is handled). Perl5 would require you to say the second character in the first match was not equal to the first character (and to change the $0's to \1 and \2), but Perl6 doesn't need this because it has longest token matching.

Essentially, if there's a sequence of more than one of a letter, you know that you have to pick all of them or else the letter you didn't pick won't match up when you do the extension loop.

It additionally changes the reposition logic from left + 1 to being right - 1 unless it's a 2 character match in which case it's right so you don't infinite loop.

Tim Bollman wrote

Regex Improvements
You should be able to change the regex to /(\w)\w$0 | (\w)$0+/ and have it work in all cases (assuming the match re-position issue is handled). Perl5 would require you to say the second character in the first match was not equal to the first character (and to change the $0's to \1 and \2), but Perl6 doesn't need this because it has longest token matching.

Essentially, if there's a sequence of more than one of a letter, you know that you have to pick all of them or else the letter you didn't pick won't match up when you do the extension loop.

It additionally changes the reposition logic from left + 1 to being right - 1 unless it's a 2 character match in which case it's right so you don't infinite loop.

Moritz wrote

Overlapping match
Tim, you're absolutely right. I should add the :overlap modifier to the regex, which should make it work again.

Write a comment

The comments on this blog post have been disabled; the comment form below will not work.

 
Name:
URL: [http://www.example.com/] (optional)
Title: (optional)
Comments:
Save my Name and URL/Email for next time