I'm developing WebApp. I've feature to quicksearch for articles.
In two words structure is:
So, when user type query in popup-quicksearch field, app
push to temporary search results array (with cache)As you can see, original array doesn't modifing.
Currenty i'm using primitive String.indexOf, but it cannot match text within formatted via html tags text (example below):
Question is about RegEx patterns. I clearly understand that it's not recommended to use RegEx to manipulate with DOM and expecting results below isn't semantically correct but it fits needs.
For example: we have something like this:
<ul><li>Item <i><span style="color:red">Y</span></i></li></ul>
and we need to highlight query e, expecting result: ... It<em>e</em>m ..., but if use trivial replace(/e/ig, '<em>$&</em>') it will replace e in style="color:red" too.
i.e. what RegEx pattern to do not touch words in tags?
Second example: we need to highlight Item Y, so expecting result is <ul><li><em>Item <i><span style="color:red">Y</em></span></i></li></ul>
If I understood correctly, you need to search within text contents of a fragment of a DOM tree. One way of achieving this is to use the XML/HTML text contents. This examples makes use of jQuery, but the idea is easily portable to other libs:
HTML:
<div id="article_contents">
Blah blah blah, Item 1, Item 2 blah blah <b>Ite</b>m <span>1</span> blah blah
</div>
JavaScript:
var source = jQuery('#article_contents').text();
var queryRegexp = new RegExp ( 'Item 1', 'g' );
var results = source.match (queryRegexp);
Now results will hold all occurences of your search string. Of course to achieve your goal of highlighting results you must go a few steps further (like using RegExp.exec to get the offsets of the matches).
A short hackish solution is to look for markup between every single letter of the search string. If your keyword is "search" it would look like this:
(s)(<[.^>]*>)*(e)(<[.^>]*>)*(a)(<[.^>]*>)*(r)(<[.^>]*>)*(c)(<[.^>]*>)*(h)
But in reality you need to do more than that, because:
display:none, visibility:hidden, etcthese <tag> are </tag> my <i><b>s</b>earch keywords</i> and if you're supposed to wrap my search in <span> tags in that markup (without actually wrapping every single character) you'll end up with a in the middle of some other tag, on a different DOM tree level - Silviu-Marian 2012-04-04 23:09