Sunday, 5 December 2010


Just to let you know, for updates on the blog, and useful links to development tips and tricks as I find them I am on twitter as delradie -!/Delradie

Using Linq to remove duplicate data from a DataTable

In response to a question on experts exchange I just wanted to do a quick post on data cleanup using Linq

Linq has a lot of very useful operations that can be used to reduce the complexity of array (or dataset) operation.

In this instance the operator .Distinct() can be used to remove duplicates in a datatable, with a small priviso:

If you call the method on the datatable (using the call .AsEnumerable() to get it in to a form you can work with in linq) with no parameters you will find that it returns all rows - this is because by default it does an object comparison, and of course all the records have their own memory address, and therefore are considered to be different. The trick is to tell it to use the default DataRowComparer which will get it to do a column by column value check.

One final point to watch out for is in getting the data back in to datatable - remember that datasets/tables/rows are reference types, so clearing down the tabe will actually cause problems, so the trick is to take a copy of the datatable, clear the original, operate on the copy, then reimport the rows to the cleared down table.

All put together,  the following function will handle this for you:

/// <summary>/// Removes duplicate rows from the target table/// </summary>/// <param name="sourceDataSet">Dataset containing the table to operate on</param>/// <param name="targetTableName">Name of the table to operate on</param>
public void GetUniqueRows(DataSet sourceDataSet, String targetTableName)  //Check there is data to operate on

   //Take a copy of the table so clearing does not affect the operation
   //Run the distinct operation
   //Clear all data from the target table  sourceDataSet.Tables[targetTableName].Clear();

  //Re-import the distinct rows to the 'live' table
  foreach (DataRow DistinctRow in OutputRows)
  DataRow[] OutputRows = (from rows in OperatingData.AsEnumerable() select rows).Distinct(System.Data.DataRowComparer.Default).ToArray() as DataRow[];
  DataTable OperatingData = sourceDataSet.Tables[targetTableName].Copy();
  if (sourceDataSet.Tables[targetTableName].Rows.Count == 0)    return;

Friday, 3 December 2010

If you've found any of this information useful

I know that everyone is pretty much charitied out, but if you've found any of the information presented here useful, or just have a good heart, please support a cause that is very dear to me:

Happy Staffie Rescue (, based in the west midlands, are a breed rescue for Staffordshire Bull Terriers - a much maligned but very loving breed (we have 3 of our own).

If you can spare even a small amount this would help make a big difference for some wonderful dogs that really do deserve much better than the hands that life has dealt them.

You can donate online via

Merry Christmas