Geeks With Blogs
Matt Watson Software developer, product visionary, and master of #dadops

After much searching, this is the best RegEx I can find for splitting a line of text from a CSV file:
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

I found it here: http://thedotnet.com/howto/work213583.aspx

Here is the magical working code:

      protected virtual string[] SplitCSV(string line)
      {         System.Text.RegularExpressions.RegexOptions options = ((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace | System.Text.RegularExpressions.RegexOptions.Multiline) 
            | System.Text.RegularExpressions.RegexOptions.IgnoreCase);
         Regex reg = new Regex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)", options);
         MatchCollection coll = reg.Matches(line);
         string[] items = new string[coll.Count];
         int i = 0;
         foreach(Match m in coll)
         {
            items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
         }
         return items;
      }
Check out Stackify! They provide a DevOps application dashboard for developers that gives them remote server access to everything they need to do application support!

Posted on Saturday, September 4, 2004 8:15 AM | Back to top


Comments on this post: RegEx for CSV

# re: RegEx for CSV
Requesting Gravatar...
thou are god
thank you
Left by bob the coder on Jun 24, 2005 4:18 PM

# re: RegEx for CSV
Requesting Gravatar...
I can't get this regex to work if the CSV looks like this...

Title,Price,Description
HelloWorld,"10,00",Desc

I get an array lenght of 4 for the second line.
Left by Steven on Nov 04, 2005 1:47 AM

# re: RegEx for CSV
Requesting Gravatar...
I changed it a little to be more 'up-to-date' for Java and to work in any case.

static String[] splitCSV( String line ) {
// java.util.ArrayList<String> elements = new java.util.ArrayList<String>(); // JAVA >=1.5
java.util.ArrayList elements = new java.util.ArrayList(); // JAVA <=1.4
java.util.regex.Matcher m = java.util.regex.Pattern.compile( "(?:^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)" ).matcher( line );
while( m.find() ) {
elements.add( m.group()
.replaceAll( "^,", "" ) // remove first comma if any
.replaceAll( "^?\"(.*)\"$", "$1" ) // remove outer quotations if any
.replaceAll( "\"\"", "\"" ) ); // replace double inner quotations if any
}
return (String[])elements.toArray( new String[0] );
}
Left by Mihi on Mar 12, 2007 3:35 AM

# re: RegEx for CSV
Requesting Gravatar...
Doesn't work if you have a comma in your value.
Left by cheezus on Jul 19, 2008 3:48 PM

# re: RegEx for CSV
Requesting Gravatar...
Hi
I've find the some problem with the Regex used in my following code:


System.Text.RegularExpressions.RegexOptions options =
((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace |
System.Text.RegularExpressions.RegexOptions.Multiline) |
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Regex reg =newRegex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)",options);
MatchCollection coll = reg.Matches(line);
string[]items = new string[coll.Count];
int i = 0;
foreach(Match m in coll)
{
items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
}

This code is for splitting CSV. Exactly, I'm getting stuck at the Regex method Matches which is taking so much time around 10 Mins when I've passed the following CSV:

string line = "C,\"123 PSST, MASSACHUSETTS,5245352,3343432";



Any Help will be appreciated.

Thank you.
Left by SandeepT on Mar 04, 2009 1:29 AM

# re: RegEx for CSV
Requesting Gravatar...
For a personal project, I'm using this one which I thought up:

("(?:[^\\"]|\\.)*")\s*($|,)

It's not exactly a pure CSV file regexp, but it works for the purpose I designed it for. It matches everything between a pair of double quotes in a CSV file, including escaped double quotes. You can use commas in the values, and empty values are ignored. If it's fed incorrect data as such as [ "value 1","value 2" error here "value 3", "value 4" ], it will ignore the second value and return the last usable quoted string before the comma.

The quoted content is returned in the first (and only) matching group, stripped of remaining leading and trailing whitespace (and newlines).

There might be a few errors still lurking in this regexp though, but it works a 100% for me.
Left by R. Hanouwer on Apr 08, 2009 10:13 AM

# re: RegEx for CSV
Requesting Gravatar...
With regards to my previous reply;

Add a "?:" in the last matching group to correct the second matching group appearing in the resultset. I copied the wrong regexp from my text editor. Sorry for that--my bad.

This is the (full) correct regexp:

("(?:[^\\"]|\\.)*")\s*(?:$|,)
Left by R. Hanouwer on Apr 08, 2009 10:17 AM

# re: RegEx for CSV
Requesting Gravatar...
thou are go0d
thank you
Left by ferdi tayfur dinle on Nov 12, 2009 7:33 AM

# re: RegEx for CSV
Requesting Gravatar...
Thanks you
Left by selda bağcan dinle on Feb 17, 2010 5:32 AM

# re: RegEx for CSV
Requesting Gravatar...
Thank youS
Left by yozgat chat on Feb 18, 2010 6:00 AM

# re: RegEx for CSV
Requesting Gravatar...
Thanks for this and thanks @Mihi for the Java version!
Left by fugu on Aug 26, 2010 2:04 AM

# re: RegEx for CSV
Requesting Gravatar...
Thank u!!!!
Left by Sebastián Rojas on Oct 06, 2011 6:00 PM

# re: RegEx for CSV
Requesting Gravatar...
There's something broken in that regex. If you goto http://regexpal.com/ and test it with the string

,test,test2,test3,test4,test5

You'll see it skips matching the first test.
Left by Chris Marisic on Nov 28, 2011 9:51 AM

# re: RegEx for CSV
Requesting Gravatar...
Thank you very much!!!
Left by furier on Jun 25, 2012 7:15 PM

# re: RegEx for CSV
Requesting Gravatar...
hi

you can add "" at first like this.

if (line.startsWith(",")) {
line = "\"\"" + line;
}
Left by furier on Jun 25, 2012 7:56 PM

# re: RegEx for CSV
Requesting Gravatar...
Thanks a lot dude!
Left by OThoniel Reyna on Aug 15, 2012 5:01 PM

Your comment:
 (will show your gravatar)


Copyright © Matt Watson | Powered by: GeeksWithBlogs.net