Using Regex.exec with parentheses regex to extract matches of a string
suggest changeSometimes you doesn’t want to simply replace or remove the string. Sometimes you want to extract and process matches. Here an example of how you manipulate matches.
What is a match ? When a compatible substring is found for the entire regex in the string, the exec command produce a match. A match is an array compose by firstly the whole substring that matched and all the parenthesis in the match.
Imagine a html string :
<html>
<head></head>
<body>
<h1>Example</h1>
<p>Look a this great link : <a href="https://stackoverflow.com">Stackoverflow</a> http://anotherlinkoutsideatag</p>
Copyright <a href="https://stackoverflow.com">Stackoverflow</a>
</body>
You want to extract and get all the links inside an a
tag. At first, here the regex you write :
var re = /<a[^>]*href="https?:\/\/.*"[^>]*>[^<]*<\/a>/g;
But now, imagine you want the href
and the anchor
of each link. And you want it together. You can simply add a new regex in for each match OR you can use parentheses :
var re = /<a[^>]*href="(https?:\/\/.*)"[^>]*>([^<]*)<\/a>/g;
var str = '<html>\n <head></head>\n <body>\n <h1>Example</h1>\n <p>Look a this great link : <a href="https://stackoverflow.com">Stackoverflow</a> http://anotherlinkoutsideatag</p>\n\n Copyright <a href="https://stackoverflow.com">Stackoverflow</a>\n </body>\';\n';
var m;
var links = [];
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
console.log(m[0]); // The all substring
console.log(m[1]); // The href subpart
console.log(m[2]); // The anchor subpart
links.push({
match : m[0], // the entire match
href : m[1], // the first parenthesis => (https?:\/\/.*)
anchor : m[2], // the second one => ([^<]*)
});
}
At the end of the loop, you have an array of link with anchor
and href
and you can use it to write markdown for example :
links.forEach(function(link) {
console.log('[%s](%s)', link.anchor, link.href);
});
To go further :
- Nested parenthesis