Regex - sentence starting with

Go To StackoverFlow.com

-1

I have some text files with a whole bunch of info in them. Most of the sentences in them start with a certain info list. Now some items I can extract with other regex codes's (for date, url, email, etc ...) so I'm using them. But for the other info I have no idea where to start ... .

For example :

ITEM_LIST_1 = xxxx .
ITEM_LIST_2 = xxxx .
ITEM_LIST_3 = xxxx .
....

I'm looking to create a regex that will extract the xxxx's (sentence) for the specific items.

Thx all

2012-04-04 05:49
by Darth Blue Ray
what kind of regex, java's regex - cctan 2012-04-04 05:51
Yes java's regex - Darth Blue Ray 2012-04-04 05:53
What about ^[^\s]+ = ([^\s]+) - Oscar Mederos 2012-04-04 05:54
Use \S instead of [^\s] ... - hochl 2012-04-04 07:10
\S (or [^\s]) won't work anyway, given the examples they wrote under my answer - Joey 2012-04-04 07:17


2

(?<=ITEM_LIST_\d+ = ).*(?= \.)

should match the xxxx in your example above. It requires a regex engine that allows for arbitrary-length lookaround, though. Most don't (.NET does).

Another option would be

ITEM_LIST_\d+\s*=\s*(.*)\s*\.

and use match number 1. This requires no lookaround, but matches more than you need and uses a capturing group to select a substring of the total match.

Both could be fine-tuned to your problem with better performance and less chance of matching something wrong if we know what your xxxx were.

EDIT: If the items are all in a single line, then above regex would fail (since it's greedy):

PS> [regex]::matches('Item_List_01 = Chapter1 overview, Who''s who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems.', 'Item_List_\d+\s*=\s*(.*)\.') | select groups

Groups
------
{Item_List_01 = Chapter1 overview, Who's who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems., Chapt...

You can fix it by making it lazy:

Item_List_\d+\s*=\s*(.*?)\.

which does work, then:

PS> [regex]::matches('Item_List_01 = Chapter1 overview, Who''s who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems.', 'Item_List_\d+\s*=\s*(.*?)\.') | select groups

Groups
------
{Item_List_01 = Chapter1 overview, Who's who, Chapter2 How to., Chapter1 overview, Who's who, Chapter2 How to}
{Item_List_02 = Continue of Chapter2, Problems., Continue of Chapter2, Problems}

However, it will fail again if items have a full-stop in them:

PS> [regex]::matches('Item_List_01 = Foo. Bar. Item_List_02 = Baz, gak.', 'Item_List_\d+\s*=\s*(.*?)\.') | select groups

Groups
------
{Item_List_01 = Foo., Foo}
{Item_List_02 = Baz, gak., Baz, gak}

This can be solved by adding a lookahead (again) which makes sure that either an end of line/string or another item follows:

Item_List_\d+ = (.*?)\.(?=$| Item_List_\d)

(Regarding spaces, \s*, etc.: I've been a little sloppy here in changing space handling a few times throughout the solutions. You should know what data you're expecting and adapt the regex accordingly. Also you varied the case of ITEM_LIST/Item_List in your question and comment. You should make that consistent, too.)

2012-04-04 05:53
by Joey
For example : something like ItemList01 = Chapter1 overview, Who's who, Chapter2 How to. ItemList02 = Continue of Chapter2, Problems. etc .. - Darth Blue Ray 2012-04-04 06:00
It will always be ITEM_LIST ... typo from my side. And the regex will be used in a small java program. Thx for all above - Darth Blue Ray 2012-04-04 06:54
Ads