Most Ruby programmers know how to get substrings out of strings using the ways described in the Ruby core documentation for the String class. For reference, I quote:
a = "hello there" a[1] #=> 101 a[1,3] #=> "ell" a[1..3] #=> "ell" a[-3,2] #=> "er" a[-4..-2] #=> "her" a[12..-1] #=> nil a[-2..-4] #=> "" a[/[aeiou](.)\1/] #=> "ell" a[/[aeiou](.)\1/, 0] #=> "ell" a[/[aeiou](.)\1/, 1] #=> "l" a[/[aeiou](.)\1/, 2] #=> nil a["lo"] #=> "lo" a["bye"] #=> nil
I almost always use the string[ /regexp/, 1 ] method, myself. However, all of the above only let you extract one substring. We can do multi-variable assignment in Ruby:
a, b, c = 1, 2, 3 # a == 1; b == 2; c == 3 a, b = b, a # a == 2; b == 1
So why can’t we do multi-variable substring extraction? Ah, but we can! As described in the MatchData documentation:
multi-variable substring extraction
all, first, second = *( /(\w+) +(\w+)/.match "Here is a sentence." ) # all == "Here is" # first == "Here" # second == "is"
Get the same result the other way around, if this ordering makes more sense to you:
all, first, second = *( "Here is a sentence.".match /(\w+) +(\w+)/ ) # all == "Here is" # first == "Here" # second == "is"
Now, what we really need is a way to do it without having to give a container for the whole match (“all” in the above code), preferably something with more concise syntax. Wouldn’t this be neat:
# This is not valid Ruby code! first, second = "Here is a sentence."[ /(\w+) +(\w+)/ ]
No related posts.
first, second = ‘this is a’.scan(/w+/)[0,2]
["this", "is"]
manveru:
Your backslash got swallowed up by my Markdown plugin:
But anyway: Thanks for pointing out this technique. However, I’m not sure it would let us grab arbitrary regexp groups, such as:
Yeah, i noticed that but didn’t want to double-post.
To parse something like this i would use:
But i get your point, groups would be quite handy at times instead of:
This time i even previewed the post
One thing i see though is that markdown doesn’t work like the docs say, 4 spaces indentation should result in a code-block.
Hrm:
Seems to work for me? The way I’m getting the highlighted code blocks is actually via the syntax highlight Wordpress plugin. Add code blocks on my blog with <pre lang=”ruby”>.
Yes, but it still tries to remove the
\Oh, I see. Well… I guess we need to use <pre> and preview.
Ruby 1.9 has named captures. With this we can name capture groups in a regexp, and then local variables are assigned the captured strings.