Parsing data in C#

Parsing data in C# is a simple process, and can be done using some basic language syntax. Sub Strings provide this functionality, and when used properly can make data collection, or simple parsing operations a much easier task.

First when parsing data from sources such as a web page, it is important to first remove all data at the beginning of the document which is unnecessary, in order to prevent the parsing utility from finding the wrong information.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
   <title>website design search results</title>
   <link>http://randomexamplesiteurl.com/</link>
   <language>en</language>
   <pubDate>Wed, 15 Apr 2009 18:31:33 GMT</pubDate>
   <lastBuildDate>Wed, 15 Apr 2009 18:31:33 GMT</lastBuildDate>
   <image>
      <title>website design - sample feed</title>
      <url>http://randomexamplesiteurl.com/testimage1.gif</url>
      <link>http://randomexamplesiteurl.com/</link>
   </image>
   <item>
      <title>Small Businesses Receive Web Design Financing from Wildfire</title>
      <link>http://randomexamplesiteurl.com/testlink1.html</link>
      <pubDate>Wed, 15 Apr 2009 07:15:30 GMT</pubDate>
      <description>This is a sample description I am using for testing purposes</description>
   </item>
   <item>
      <title>Effective website design for successful ecommerce</title>
      <link>http://randomexamplesiteurl.com/testlink2.html</link>
      <pubDate>Wed, 15 Apr 2009 11:23:38 GMT</pubDate>
      <description>This is a sample description I am using for testing purposes</description>
   </item>
   <description>website design - XML Sample</description>
</channel>
</rss>

Finding unique tags to mark the beginning of the data to be parsed is the key to building an efficient parsing utility. In the above sample, all of the text prior to "" is irrelevant if you are only attempting to gather the item data, and will not be needed to complete the parsing process. To remove this from your text use the following code: (code assumes data is loaded in a string variable named strData)

int intStartPos = strData.IndexOf("<item>");
strWorkingRSS = strData.Substring(intStartPos);

Once the irrelevant data has been removed, you can then focus on parsing the remainder of the string, with the following code this can be done by using any unique string at the beginning and the end of the data you would like to capture. The following code will always stop at the first instance of search string so if you continue to trim the text as you work using the above sample, you can easily write a loop to pull out each of the items until the data has all been parsed successfully. The below sample will result in assigning the variable strTitle with the text in between the "<title>" and "</title>" tags.

string strOpenString = "<title>";
intStartPos = strData.IndexOf(strOpenString ) + strOpenString .Length;
int intEndPos = strData.IndexOf("</title>");
int intLength = intEndPos - intStartPos;
string strTitle = strData.Substring(intStartPos, intLength);

This should be enough information to get any parsing project started. The data that I used for my sample may have been XML, but the real value in this type of parsing utility, is in cases where data from an HTML site, or group of HTML pages needs to be moved to a dynamic location such as a database. Many times the only viable option for data transfer is to use a "screen scraping" application, and this code provides a general outline for how to build one for most any circumstances.

If you're looking for a web development company to help you figure out how to do your next project, contact us at Sales & Marketing Technologies.