I have a LARGE datatable (500k-1m rows), without going into detail this is a requirement as the end user needs/wants to be able to see all of the data. This is on a local server so bandwidth etc are not concerns for me.
I have a DateTime field in the DataTable which I need to group, let me explain what I mean by grouping... It's probably not what you think I mean (from looking at the other questions on here!).
var table = new DataTable();
table.Columns.Add("EventTime", typeof(DateTime));
table.Columns.Add("Result", typeof(String));
table.Columns.Add("ValueOne", typeof(Int32));
table.Columns.Add("ValueTwo", typeof(Int32));
table.Rows.Add("2012-02-06 12:41:45.190", "A", "7", "0");
table.Rows.Add("2012-02-06 12:45:41.190", "B", "3", "89");
table.Rows.Add("2012-02-06 12:59:41.190", "C", "1", "0");
table.Rows.Add("2012-02-06 13:41:41.190", "D", "0", "28");
table.Rows.Add("2012-02-06 17:41:41.190", "E", "0", "37");
table.Rows.Add("2012-02-07 12:41:45.190", "F", "48", "23");
I would expect the above table to be grouped so that I get a sum of the "ValueOne" column, and an average of the "ValueTwo" column. I need the grouping to be a little bit flexible so that I can specify that I want grouping by minutes (only the first and last rows would be grouped, the rest would just provide their values), or by days (all but the last row would be grouped into a single row), etc.
I've tried this a few times but I'm getting no where. My LINQ knowledge isn't great, but I thought I'd be able to do this!
Note: The DataTable is already on the machine for calculations/views which cannot be changed, so saying "Stop being an idiot, filter in SQL!!!" is a valid answer, just useless to me! :-D
Also, in case you missed it in the title, I need this in C# - I'm working with .NET 4.0...
Thanks in advance, assuming you decide to help! :-)
row.Field<DateTime>("EventTime")
a contortion or an index? (not to mention a typed DataSet - Rango 2012-04-03 20:41
row.Field<DateTime>("EventTime")
feels like I'm doing contortions compared to event.EventTime
. It requires both a cast and a "magic string" value. It's an indexer because I'm asking the row for the value at index "EventTime"
, and it's not type-safe because if you changed the type of the "EventTime" field, the compiler wouldn't complain. I'm not clear on what a typed DataSet has to do with it, but I'm open to be enlightened - StriplingWarrior 2012-04-03 20:58
DataSet
doesn't need an indexer and no casting and is aware of the datamodel, hence bypasses all of your mentioned disadvantages but it's an extension of a weakly typed DataSet(that's the relation) - Rango 2012-04-03 21:30
The other three answers are close, but as you pointed out they group events that occurred in the same second of the minute, not events that happened in the same second, which is what you want. Try this:
var query = from r in table.Rows.Cast<DataRow>()
let eventTime = (DateTime)r[0]
group r by new DateTime(eventTime.Year, eventTime.Month, eventTime.Day, eventTime.Hour, eventTime.Minute, eventTime.Second)
into g
select new {
g.Key,
Sum = g.Sum(r => (int)r[2]),
Average = g.Average(r => (int)r[3])
};
You can adjust what information you pass to the DateTime constructor to group by different time parts.
The only thing you need to change is the property you want to group by.
var query = from x in DataSource
group x by x.EventTime.Minute into x
select new
{
Unit = x.Key,
SumValueOne = x.Sum(y => y.ValueOne),
AverageValueTwo = x.Average(y => y.ValueTwo),
};
Something like this should work:
DataTable dt = GetDataTableResults();
var results = from row in dt.AsEnumerable()
group row by new { EventDate = row.Field<DateTime>("EventTime").Date } into rowgroup
select new
{
EventDate = rowgroup.Key.EventDate,
ValueOne = rowgroup.Sum(r => r.Field<int>("ValueOne")),
ValueTwo = rowgroup.Average(r => r.Field<decimal>("ValueTwo"))
};
row.Field<DateTime>("EventTime").Date
to whatever you need - James Johnson 2012-04-03 20:48
Here's what your baseline code could look like:
var query = table.Rows.Cast<DataRow>()
.GroupBy(r => ((DateTime)r[0]).Second)
.Select(g => new
{
g.Key,
Sum = g.Sum(r => (int)r[2]),
Average = g.Average(r => (int)r[3])
});
To add flexibility, you could have something like this:
IEnumerable<IGrouping<object, DataRow>> Group(IEnumerable<DataRow> rows, GroupType groupType)
{
// switch case would be preferable, but you get the idea.
if(groupType == GroupType.Minutes) return rows.GroupBy(r => ((object)((DateTime)r[0]).Minute));
if(groupType == GroupType.Seconds) return rows.GroupBy(r => ((object)((DateTime)r[0]).Second));
...
}
var baseQuery = table.Rows.Cast<DataRow>();
var grouped = Group(baseQuery, groupType);
var query = grouped
.Select(g => new
{
g.Key,
Sum = g.Sum(r => (int)r[2]),
Average = g.Average(r => (int)r[3])
});
query
is now an IEnumerable<>
of an anonymous type that has Key, Sum, and Average properties. You'd have to create a datatable out of it yourself. And yes, the way I've implemented this would only be useful for finding out which months tend to be busiest, e.g., whereas you'll need to combine my strategy with David Nelson's GroupBy structure to actually group by month the way it sounds like you want to - StriplingWarrior 2012-04-03 21:10