Project

General

Profile

Actions

Feature #1434

open

Optimization of the STS hit finder scan over clusters for real data

Added by Pierre-Alain Loizeau over 2 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
High
Target version:
-
Start date:
11/15/2019
Due date:
% Done:

90%

Estimated time:

Description

The current clusters match finder algorithm in the CbmStsSensorDssd::FindHits method is looping over all possible combinations of Front and back clusters, even if it jumps (continue) all those outside of the time match window.

This leads to unmanageable reco times for the COSY data where we have O(100k) clusters per TS leading to O(10k) hits per TS: the full reco time is ~10s for a 10ms TS.

A simple improvement to the algorithm would be the following:
  1. Ensure the input vector of clusters is time sorted
  2. Prepare a variable storing the index of the first candidate back cluster (= first cluster in time window) for the previous front cluster and initialize it to 0
  3. In the loop on back clusters (within the loop on front clusters), loop only from this index on and stop (break) as soon as a cluster out of the window is found

=> as the front clusters are time sorted, the time window of one of them cannot start before the time window of the previous one
=> as the back clusters are time sorted, the first candidate for a front cluster cannot be placed before the first candidate of the previous front cluster

The attached patch:
- adds sorting binary operator classes against time for the CBM Digi and CBM Hit classes as well as the STS Cluster class, which can be used with the C++ std sorting algorithms
- makes the proposed changes to CbmStsSensorDssd::FindHits

With these changes, a speedup by a factor 400-500 was observed with the COSY data

Potential caveats:
  • the sorting is done on the original vector of clusters, so it should be checked that it does not break a stored indexing in another class/vector (e.g. maybe between Clusters and their Match objects)

Files

StsHitFinderOpti.patch (5 KB) StsHitFinderOpti.patch Pierre-Alain Loizeau, 11/15/2019 10:38 AM
StsHitFinderOpti.patch (4.43 KB) StsHitFinderOpti.patch Pierre-Alain Loizeau, 11/19/2019 03:24 PM
Actions #1

Updated by Volker Friese over 2 years ago

  • Status changed from New to In Progress
  • Assignee changed from Volker Friese to Andreas Redelbach

The STS hit finder was never in earnest developed for time-based reconstruction. The current implementation is taken over from the event-by-event reconstruction, adding jus a check on the time difference between front and bick side cluster to ensure that no unphysical results (hits) are delivered. As you correctly state, this leads to unneccesary looping over the input clusters, which for real-size time slice is a severe performance penalty.

A more appropriate / optimised hit finder is developed at FIAS by Florian Böck, but not yet integrated into the repository. This is expected to happen in the near future (some weeks). I have no problems with your patch as intermediate solution, but I would like Andreas to have a look whether it will not make the integration of the new code harder.

Some general remarks:
  • It seems obvious that any reasonable time-based implementation of the hit finder requires time-sorted input (of clusters). If this is so, then the sorting should be performed by the cluster finder. This would avoid possible problems with the indizes for back referencing.
  • To the comparison needed for the sorting: in your patch, this is implemented as separate class, overloading the constructor with a return value. This is surely not an optimal solution, since a comparator object will be constructed and destructed for each comparison operation. Better solution: define a global function bool CompareDigisbyTime(const CbmDig& digi1, const CbmDigi& digi2). Or (preferrable in my view), make this a static member function of CbmDigi. Pass it as argument to std::sort using std::bind or (probably the fastest) using a Lambda.
Actions #2

Updated by Andreas Redelbach over 2 years ago

Indeed it makes sense to coordinate the optimization of local reconstruction in Sts. Florian Boeck’s thesis has already focused on this, namely:
1. StsDigisToHits as a single task
2. Parallelization and sorting on module level
3. Sliding time window approach for combination of front and back side clusters (in CbmStsSensorDssd::FindHits)

It seems that point 3 corresponds also to Pierre’s suggestion and is essential for optimized runtimes. The strategy for sorting at the digital or cluster level could be discussed, maybe also in a more general context, not only related to Sts. Since we have started at FIAS tests for integration of StsDigisToHits into the trunk, it would be useful to finish the corresponding integration and tests first. These tests are run for very large time slices to really check performances. Since this week, Florian Boeck is actually focusing again on this task.

Actions #3

Updated by Pierre-Alain Loizeau over 2 years ago

@Volker:
The proposed comparators were indeed not optimum, it was a lazy copy of some sorting in littrack. I actually read now that these binary functions are deprecated since c++11/c++14 and removed in c++17.
I had a look at the lambda functions but for this case (sorting) it seems to be used mostly for convenience sake of making the code obvious, as the compiler creates automatically in the same block a free flying function or a function class depending on what is needed.

As we have quite a lot of objects that can be sorted against time, I would prefer to have a single implementation place, which also makes the code simpler in macros when time sorting of extracted sub-arrays is needed.
I made a first attempt with a templated function which works without performance change (on my laptop) for both the hit building (StsClusters) and the usage in root macros (CbmDigis and CbmHits), which you can find in the attached patch.
The other advantage is that to add another sorter in compiled code only the include is needed and for macros only one line in the LinkDef should be added (we could in principle already declare most cluster sorters).

@Andreas:
Indeed, it would be even better to bring all optimizations in one shot, and the sliding window method he found is probably close to what I made.
What I am now unsure is whether we should not for a short time decouple my version into a new "BeamtimeHitFinder" class to avoid merge conflict while providing a reasonable reco for the COSY data as soon as possible. I had already a few people asking me when it will be ready and with mCBM coming next week I would want to avoid having too many parallel local copies on my computers.

Actions #4

Updated by Andreas Redelbach over 2 years ago

General point:
Similar to Pierre I think it is really useful to develop efficient time-sorting to be applied at the levels of digits, clusters, hits and also tracks. Starting this discussion makes sense and one should keep in mind that already time comparisons at the digi level have been developed such as e.g. at
https://redmine.cbm.gsi.de/projects/cbmroot/repository/cbmroot/entry/trunk/fles/mcbm2018/tasks/CbmCheckTiming.cxx#L452

For a BeamtimeHitFinder:
It seems that the patch (number 2) suggested by Pierre is a first steps towards the optimized local reconstruction of Sts. Obviously it does not combine Cluster/Hit finding in one task and also the parallelization at the level of modules is not included, but the central double loop is restricted to a relevant time window and the distribution of StsHits is identical (to the eye).
As a cross-check I have run the StsHitFinder on 100 min. bias events with all StsDigis in 1 timeslice. The number of StsClusters was ~135.000 and corresponding results:
StsHitFinder without patch: 103053 hits, 0.26 s
with patch: 102931 hits, 0.16 s
Due to the non-linear performance of the standard StsHitFinder, the runtimes will be reduced more significantly for greater number of clusters.

Actions #5

Updated by Andreas Redelbach over 2 years ago

I have extended the cross-checks of Pierre’s patch a bit and it turned out that it worked also in the case of StsDigis in more than one timeslice. Actually I also tried to test the potential speedup based on mCBM simulation, but I could only find urqmd input files at 1.65 GeV for 100 events. In this low statistics case for mCBM, the numbers of hits perfectly agree and the runtime was faster by a factor of 1.6 (for ~1000 clusters). So I cannot make a statement about the realistic speedup for mCBM StsHitFinder based on low stat. simulation.

@Pierre:
Concerning the urgency of adding this patch wrt COSY and mCBM: You are probably aware of the practical implications having this patch in the trunk directly or a more extensive version in approx. 2 weeks, can you shortly comment?

@Volker:
The intermediate patch would also lead to new (template) functions in the data directory. If we agree on this, I would also add CbmTrack in data/DataLinkDef.h for CompareCbmDataTime function (in addition to CbmDigi, CbmStsCluster, CbmHit).

Actions #6

Updated by Andreas Redelbach over 2 years ago

  • % Done changed from 0 to 90

We have developed an optimized version of local reconstruction for Sts and thoroughly tested its results in several setups. This optimization includes basically three use cases:

1. Sts hit-finding within the relevant time window after sorting the Sts clusters w.r.t. time. This approach is similar to Pierre's suggestion and indeed needs only minor modifications of existing code.

2. New StsDigisToHits with cluster output: Combination of cluster and hit finding tasks into a new task. This new class also includes processing on the level of Sts modules, thus allowing an efficient parallelization.

3. New StsDigisToHits without cluster output: Identical to the previous case, without storing Sts cluster results

It remains to be coordinated when these updates are commited and if/how we want to include these options also in existing macros.

Actions

Also available in: Atom PDF