I have little problem in regular expressin creation. Expected input:
blahblahblah, blahblahblah, 'blahblahblah', "blahblahblah, asdfd"
I need to get words separated with comma to array. But, I cannot use split function, 'cause comma can occure in strings too. So, Expected output is:
arr[0] = blahblahblah
arr[1] = blahblahblah
arr[2] = 'blahblahblah'
arr[3] = "blahblahblah, asdfd"
Does anybody know some regular expression or some another solution that can help me and give me similair output? Please help.
'blahblah'
, just blahblah
or "blahblah"
Ωmega 2012-04-04 17:14
"First "" item"
, as by CSV it is one string, because ""
is converted to "
inside of the string item.. - Ωmega 2012-04-04 17:16
You could do somthing like this, given the limited problem. The Regex is shorter and possibly simpler.
string line = <YourLine>
var result = new StringBuilder();
var inQuotes = false;
foreach(char c in line)
{
switch (c)
{
case '"':
result.Append()
inQuotes = !inQuotes;
break;
case ',':
if (!inQuotes)
{
yield return result.ToString();
result.Clear();
}
default:
result.Append()
break;
}
}
'
, not just "
, even it is not standard behavior.. - Ωmega 2012-04-04 17:26
yield return
and fallthrough case blocks is not recommended. However, I do like the concept. Fast and easy to understand. Also: @stackoverflow: Simple fix - Mooing Duck 2012-04-04 17:47
"one', 'two"
as two elements - Ωmega 2012-04-04 17:53
I'm not sure this is the most optimal, but it produced the correct output from you test case on http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx:
(?>"[^"]*")|(?>'[^']*')|(?>[^,\s]+)
C# string version:
@"(?>""[^""]*"")|(?>'[^']*')|(?>[^,\s]+)"
"first "" item", "Second Item", Third
Ωmega 2012-04-04 17:13
One possible approach is to split by commas (using string.Split
, not RegEx) and then iterate over the results. For each result that contains 0 or 2 '
or "
characters, add it to a new list. When a result contains 1 '
or "
, re-join subsequent items (adding a comma) until the result has 2 '
or "
, then add that to the new list.
Instead of rolling your own CSV parser, consider using the standard, out-of-the-box TextFieldParser class that ships with the .NET Framework.
Or alternatively, use Microsoft Ace and an OleDbDataReader to directly read the files through ADO.NET. A sample can be found in a number of other posts, like this one. And there's this older post on CodeProject which you can use as a sample. Just make sure you're referencing the latest Ace driver instead of the old Jet.OLEDB.4.0
driver
These options are a lot easier to maintain in the long run than any custom built file parser. And they already know how to handle the many corner cases that surround the not so well documented CSV format.