How to make 'not contains' regular expression

Go To StackoverFlow.com

-2

I have little problem in regular expressin creation. Expected input:

blahblahblah, blahblahblah, 'blahblahblah', "blahblahblah, asdfd"

I need to get words separated with comma to array. But, I cannot use split function, 'cause comma can occure in strings too. So, Expected output is:

arr[0] = blahblahblah
arr[1] = blahblahblah
arr[2] = 'blahblahblah'
arr[3] = "blahblahblah, asdfd"

Does anybody know some regular expression or some another solution that can help me and give me similair output? Please help.

2012-04-04 16:43
by user35443
I just need to get words from input separated by comma - user35443 2012-04-04 16:49
.. and I cannot split it - user35443 2012-04-04 16:49
looks suspiciously like CSV format - Jodrell 2012-04-04 16:51
Yes, I need values separated by comma - user35443 2012-04-04 16:52
except when the comma is contained in double quotes, but what about double quotes within double quotes, is that allowed - Jodrell 2012-04-04 16:56
no. That's not allowed - user35443 2012-04-04 16:59
So, is this actually any line of CSV or is this problem limited exactly to your example and just pseudo CSV - Jodrell 2012-04-04 17:00
CSV does not support 'blahblah', just blahblah or "blahblah"Ωmega 2012-04-04 17:14
How do you want to handle strings like "First "" item", as by CSV it is one string, because "" is converted to " inside of the string item.. - Ωmega 2012-04-04 17:16
This is a twist on the classic XY Problem. Your actual problem is how to split input by commas, except ones in quotes. The title of your question makes no mention of your actual problem! This makes it less likely that you'll get the help you need. You're limiting the pool of answerers to people who are both interested enough in problem Y to read further, and know enough about problem X to give a good solution - Kevin 2012-04-04 17:27
Not sure how you want to handle spaces between items and newlines.. - Ωmega 2012-04-04 17:28
I suggest you to convert input to CSV standard and then use some technique for such standard.. - Ωmega 2012-04-04 17:29


0

You could do somthing like this, given the limited problem. The Regex is shorter and possibly simpler.

string line = <YourLine>
var result = new StringBuilder();
var inQuotes = false;

foreach(char c in line)
{
    switch (c)
    {
        case '"':
            result.Append()
            inQuotes = !inQuotes;
            break;

        case ',':
            if (!inQuotes)
            {
                yield return result.ToString();
                result.Clear();
            }

        default:
            result.Append()
            break;                
    }
}
2012-04-04 17:13
by Jodrell
user35443 wants also support of ', not just ", even it is not standard behavior.. - Ωmega 2012-04-04 17:26
I have did it before I've read this post. But it works - user35443 2012-04-04 17:38
@user35443 - So then you should edit your question, because you accepted answer that is not what question is asking for... And SO is here for other readers as well, so don't confuse them - Ωmega 2012-04-04 17:46
The use of yield return and fallthrough case blocks is not recommended. However, I do like the concept. Fast and easy to understand. Also: @stackoverflow: Simple fix - Mooing Duck 2012-04-04 17:47
@MooingDuck - I meant to edit question, not answer. Your edit makes code useless, as it will now match "one', 'two" as two elements - Ωmega 2012-04-04 17:53
@stackoverflow: Ah, didn't think of nesting. I rolled the edit back, that's a much more substantial edit than I thought it was - Mooing Duck 2012-04-04 17:54


0

I'm not sure this is the most optimal, but it produced the correct output from you test case on http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx:

(?>"[^"]*")|(?>'[^']*')|(?>[^,\s]+)

C# string version:

@"(?>""[^""]*"")|(?>'[^']*')|(?>[^,\s]+)"
2012-04-04 16:54
by FishBasketGordo
+1 for teaching me about atomic group - Robbie 2012-04-04 17:02
Will not work for "first "" item", "Second Item", ThirdΩmega 2012-04-04 17:13
@stackoverflow - Yes, and I didn't expect it to. This requires the quoted strings to not contain similar quotation marks. As I said, it produces the correct output for the (limited) test case given - FishBasketGordo 2012-04-04 17:36
@FishBasketGordo - your code works for limited specification, which is what user35443 asked for.. - Ωmega 2012-04-04 17:44


0

One possible approach is to split by commas (using string.Split, not RegEx) and then iterate over the results. For each result that contains 0 or 2 ' or " characters, add it to a new list. When a result contains 1 ' or ", re-join subsequent items (adding a comma) until the result has 2 ' or ", then add that to the new list.

2012-04-04 16:58
by Jay
Oh, well that's a simple solution - Mooing Duck 2012-04-04 17:26
@MooingDuck - are you serious - Ωmega 2012-04-04 17:30
@stackoverflow: This is not the fastest or most elegant answer, but it's very simple to understand, and gets the right results. I can't validate the rest of the answers, because those Regexes are beyond me. This and Jodrell's are the only suggestions that I could do - Mooing Duck 2012-04-04 17:43
@MooingDuck - agreed : - Ωmega 2012-04-04 17:45


0

Instead of rolling your own CSV parser, consider using the standard, out-of-the-box TextFieldParser class that ships with the .NET Framework.

Or alternatively, use Microsoft Ace and an OleDbDataReader to directly read the files through ADO.NET. A sample can be found in a number of other posts, like this one. And there's this older post on CodeProject which you can use as a sample. Just make sure you're referencing the latest Ace driver instead of the old Jet.OLEDB.4.0 driver

These options are a lot easier to maintain in the long run than any custom built file parser. And they already know how to handle the many corner cases that surround the not so well documented CSV format.

2012-04-04 23:00
by jessehouwing
Ads