I have some text files with a whole bunch of info in them. Most of the sentences in them start with a certain info list. Now some items I can extract with other regex codes's (for date, url, email, etc ...) so I'm using them. But for the other info I have no idea where to start ... .
For example :
ITEM_LIST_1 = xxxx .
ITEM_LIST_2 = xxxx .
ITEM_LIST_3 = xxxx .
....
I'm looking to create a regex that will extract the xxxx's (sentence) for the specific items.
Thx all
\S
(or [^\s]
) won't work anyway, given the examples they wrote under my answer - Joey 2012-04-04 07:17
(?<=ITEM_LIST_\d+ = ).*(?= \.)
should match the xxxx
in your example above. It requires a regex engine that allows for arbitrary-length lookaround, though. Most don't (.NET does).
Another option would be
ITEM_LIST_\d+\s*=\s*(.*)\s*\.
and use match number 1. This requires no lookaround, but matches more than you need and uses a capturing group to select a substring of the total match.
Both could be fine-tuned to your problem with better performance and less chance of matching something wrong if we know what your xxxx
were.
EDIT: If the items are all in a single line, then above regex would fail (since it's greedy):
PS> [regex]::matches('Item_List_01 = Chapter1 overview, Who''s who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems.', 'Item_List_\d+\s*=\s*(.*)\.') | select groups
Groups
------
{Item_List_01 = Chapter1 overview, Who's who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems., Chapt...
You can fix it by making it lazy:
Item_List_\d+\s*=\s*(.*?)\.
which does work, then:
PS> [regex]::matches('Item_List_01 = Chapter1 overview, Who''s who, Chapter2 How to. Item_List_02 = Continue of Chapter2, Problems.', 'Item_List_\d+\s*=\s*(.*?)\.') | select groups
Groups
------
{Item_List_01 = Chapter1 overview, Who's who, Chapter2 How to., Chapter1 overview, Who's who, Chapter2 How to}
{Item_List_02 = Continue of Chapter2, Problems., Continue of Chapter2, Problems}
However, it will fail again if items have a full-stop in them:
PS> [regex]::matches('Item_List_01 = Foo. Bar. Item_List_02 = Baz, gak.', 'Item_List_\d+\s*=\s*(.*?)\.') | select groups
Groups
------
{Item_List_01 = Foo., Foo}
{Item_List_02 = Baz, gak., Baz, gak}
This can be solved by adding a lookahead (again) which makes sure that either an end of line/string or another item follows:
Item_List_\d+ = (.*?)\.(?=$| Item_List_\d)
(Regarding spaces, \s*
, etc.: I've been a little sloppy here in changing space handling a few times throughout the solutions. You should know what data you're expecting and adapt the regex accordingly. Also you varied the case of ITEM_LIST
/Item_List
in your question and comment. You should make that consistent, too.)