Die Inhalte auf dieser Seite stehen unter diesem Creative Commens License Dingsbums. Also fröhlich kopieren und weiterverteilen. Bitte abweichende Copyright-Hinweise in anderen Blogs respektiveren.

Montag, 29. März 2010

2010-03-29-Vb20xx-Bottlenecks and traps using Linq with expensive select-expressions

VersionDescription
2010-03-29 T15:51:00Corrected formula for number of calls to new.

Description:

Implement Lazy-Load pattern for IEnumerator.Item[Get].

Problem:

If you are a linq-beginner there is something you should really be aware of before using it. Otherwise some day you may wonder about strange bugs or bad performance of your linq application.

Starting with linq some months ago I meanwhile like it very much and went deeply in using Linq-querys for various use cases. I used the select expression also to instantiate objects as shown in the below code snippet. Doing so i walked in a trap because i was not aware, that the select expression is evaluated each time the object set is enumerated. Also complex queries may be evaluated very often which increases runtime costs when the result tree is not copied.

Consider the following Vb.Net code:

 Dim aCount As Integer = (From aNum In New Integer() {1,2,3} Select New Object).Count

This query seems ok, but the problem is, that new objects are instantiated each time the list is iterated and even if one uses the Count-extension-method. This results in a number of calls to new of n * c + n * i where n is the number of items in the list, c is the number of requests to the count method and i is the number of iterations. The required number of calls to new is more less equal to n.

I now use following extension method to solve this problem and get the desired number of n calls to new:


 Module MyExtensions
  <System.Runtime.CompilerServices.Extension()>_
  Friend Function EvaluateItems(Of T)(ByVal aItems As IEnumerable(Of Func(Of T))) As IEnumerable(Of T)
   Dim aArray As T()
   Dim aCount As Integer = aItems.Count
   ReDim aArray(aCount - 1)
   Dim aIndex As Integer
   Dim aItem As Func(Of T)
   For Each aItem In aItems
    aArray(aIndex) = aItem()
    aIndex = aIndex + 1
    Next
   Return aArray
  End Function
 End Module

 Dim aQuery As IEnumerable(Of Func(Of Object)) = (From aNum In New Integer() {1,2,3} Select Function() New Object)  Dim aCount1 As Integer = aQuery.Count
 Dim aObjects As IEnumerable(Of Object) = aQuery.EvaluateItems
 Dim aCount2 As Integer = aObjects.Count

With the obove code only n constructors calls occur where n is the number of objects in the list.

However, i have one suggestions here.

Of course it is necessarry that the select expression is evaluated each time the item is requested. However, this does not apply if the list is only enumerated using IEnumerator.MoveNext without using IEnumerator.Item[Get]. The Interface design is suitable for implementing a lazy load here but (linq)-implementations of IEnumerable do not follow this.

I also wrote a extension method to (deep)-copy any IEnumerable tree for optimizing complex queries where updating is handled by the owner of the query result. The implementation quite tricky and not not so good so i don't post it here yet. Maybe Microsoft could provide a default extension method or a code snippet to do this.

Beside of this i'd like to suggest linq to implement a lazy load in IEnumerator.Item.Get to allow iteration without evaluating the select expression. For example this is done when counting items or when requesting an item at the end of the object set using ElementAt. For some use cases (for example implementing paging) this behaviour is required.

Solution:

1a. Implement lazy load in certain IEnumerator.Item[Get] implementations. (For example Linq)

Or

1b. provide extension method to create a ghost-list.

2. Provide a IEnumerable-Extension method which deep-copys items in an ienumerable (to an array) to avoid unneccesarry evaluations of the select expression.

TestCase:


 Class C
  Public Sub New()
   InstanceCount = InstanceCount + 1
  End Sub
  Public Shared InstanceCount As Integer
 End Class

 Module Test

  Function MyCount(ByVal a As IEnumerable) As Integer
   Dim aCount As Integer
   Dim aEnumerator As IEnumerator = a.GetEnumerator
   While aEnumerator.MoveNext
    aCount = aCount + 1
   End While
   Return aCount
  End Function

  Sub Test
   Debug.Assert(3 = MyCount((From aNum In New Integer() {1,2,3} Select New C).CreateGhostList)
   Dim aSuggestionImplemented As Boolean = True
   If aSuggestionImplemented Then
    ' This will fail with Vb.Net 2010
    Debug.Assert(0 = C.InstanceCount)
   Else
    ' In Vb.Net 2010 using IEnumerator.MoveNext evaluates the Select expression of a linq-query.
    Debug.Assert(3 = C.InstanceCount)
   End If
  End Sub

  End Module

Freitag, 5. März 2010

Language integration of well know Design-Patterns

Since the mid 70s or so design patterns have become a powerfull way of standardizing the static object model of a software application.

I manage that there seems to be less awareness or understanding that for certain patterns there exist language integrations.

For example since Microsoft Visual Basic 6 there exists the Keyword "Event". Using this single keyword you can define an observable object, and with another keyword (in vb.net it is 'Handles') you can define an Observer pattern in ~3 lines of code. Some years ago in c++ you may have written some hundred lines of code to do so.

Look at the Decorator pattern. This pattern in fundamental way does just the same thing which you can implement in Aspect oriented languages which gives you much more power than standard implementations of the pattern could provide.

Well, this whole thing is quite amazing. It boosts compactness of code to a quite high level. Compared to the code reduction you get generating Assembler code for a function or object this is quantum jump in the ongoing process of defining higher languages.

I hope that more language integrations for well known patterns will become popular and some day we'll have language-integration lists for various patterns.

The homogenity of languages goes on. Some years ago and nowadays industrial software projects may contain a dozens of proprietary interface definition languages and each step between languages or systems requires a lot of marshalling code which is not always generated. Think how many design patterns you use in one software project. May there be each or at least some design patterns implemented using a domain specific language or a special keyword? Well, using .NET Plattform you allready do it. Using Aspect J you allready do it.

I'm thinking of a list of language integrations of well known patterns but compared to the number of patterns there seem to exist not too much for now.

PatternLanguage integration
GOF-ObserverMicrosoft VisualBasic "Event" Keyword/other clr-syntaxes
GOF-DecoratorAspect oriented languages
GOF-StateUML-Statemachine/other statemachines
Reflection -Reflection-APIs of various languages (.NET/CLR, Java,c++ rtti ...)
-Dynamic languages (in restricted way)