How to highlight text within & out of tags?

Go To StackoverFlow.com

1

I'm developing WebApp. I've feature to quicksearch for articles.

In two words structure is:

  • Page
  • Global array (json, 100-150 items) with articles which is fetched by ajax. (with fields: id, title, snippet). Title & Snippet may contain simple style markup tags.

So, when user type query in popup-quicksearch field, app

  1. Search in global array
  2. If found matches, push to temporary search results array (with cache)
  3. Highlight matches in temp. results array and show to user

As you can see, original array doesn't modifing.

Currenty i'm using primitive String.indexOf, but it cannot match text within formatted via html tags text (example below):

Question is about RegEx patterns. I clearly understand that it's not recommended to use RegEx to manipulate with DOM and expecting results below isn't semantically correct but it fits needs.

For example: we have something like this:

<ul><li>Item <i><span style="color:red">Y</span></i></li></ul>

and we need to highlight query e, expecting result: ... It<em>e</em>m ..., but if use trivial replace(/e/ig, '<em>$&</em>') it will replace e in style="color:red" too.

i.e. what RegEx pattern to do not touch words in tags?


Second example: we need to highlight Item Y, so expecting result is <ul><li><em>Item <i><span style="color:red">Y</em></span></i></li></ul>

2012-04-04 17:33
by mjey
"I clearly understand that it's not recommended to use RegEx" ... no, you obviously don't. Use an HTML parser - ocodo 2012-04-26 15:52


0

If I understood correctly, you need to search within text contents of a fragment of a DOM tree. One way of achieving this is to use the XML/HTML text contents. This examples makes use of jQuery, but the idea is easily portable to other libs:

HTML:

<div id="article_contents">
Blah blah blah, Item 1, Item 2 blah blah <b>Ite</b>m <span>1</span> blah blah
</div>

JavaScript:

var source = jQuery('#article_contents').text();
var queryRegexp = new RegExp ( 'Item 1', 'g' );
var results = source.match (queryRegexp);

Now results will hold all occurences of your search string. Of course to achieve your goal of highlighting results you must go a few steps further (like using RegExp.exec to get the offsets of the matches).

2012-04-26 15:27
by wroniasty


-1

A short hackish solution is to look for markup between every single letter of the search string. If your keyword is "search" it would look like this:

(s)(<[.^>]*>)*(e)(<[.^>]*>)*(a)(<[.^>]*>)*(r)(<[.^>]*>)*(c)(<[.^>]*>)*(h)

But in reality you need to do more than that, because:

  • scripts
  • textareas
  • display:none, visibility:hidden, etc
2012-04-04 17:43
by Silviu-Marian
as i described above there aren't whose tags in data to search. text with simple style markup tags - mjey 2012-04-04 17:47
this regex doesnt work for 1st example (matched text in tag) & doesnt work for 2nd example - mjey 2012-04-04 17:52
dummy comment for notify @grigore : - mjey 2012-04-04 18:14
I haven't tested at all, it was just how it's supposed to look like; see http://txt2re.com, match these <tag> are </tag> my <i><b>s</b>earch keywords</i> and if you're supposed to wrap my search in <span> tags in that markup (without actually wrapping every single character) you'll end up with a in the middle of some other tag, on a different DOM tree level - Silviu-Marian 2012-04-04 23:09
Ads