Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3107

Benchmark the performance of FDAS Agilex FPGA version that uses two DDR SDRAM interfaces and is a direct translation of the Intel Arria 10 family design

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Feature
    • Not Assigned
    • PI17
    • COM PSS SW
    • None
    • Data Processing
    • Hide

      In PI15 the FDAS FPGA was translated from the older Intel Arria 10 family to the new higher performance Intel Agilex F family. This translation mainly involved generating the special Intel Agilex IP blocks such as the PCIe Interface block, DDR SDRAM Controller Block and the Fourier Transform Blocks and then modifying the original FDAS circuits to integrate and connect to these new IP blocks.

      In PI16 the software that runs on the Host PC to allow communication over the PCIe interface to the Agilex FPGA was evaluated and tested. To perform this task the Agilex FPGA was loaded with an Intel “Reference Design” which contains the PCIe Interface Block and RAMs to store the data from configuration writes and Direct Memory Access (DMA) writes from the PC over the PCIe interface. Read-back of this configuration data and DMA data could then be performed by the PC over the PCIe Interface to confirm the data has been correctly written into the RAMs in the Intel Agilex FPGA. In this way it was possible to prove that the software running on the PC was operating correctly.

      In feature SP-2945 corrections to the FDAS Agilex design were successfully implemented to ensure PCIe accesses by the Host PC operated correctly.

      The object of this feature is to test and performance benchmark the FDAS Agilex design that uses two DDR SDRAM interfaces and compare the results with those for the older Intel Arria 10 family.

      This benchmarking will check the following parameters:-

      • The time for FDAS to process a DM. With the higher internal clock speed of the Agilex family it should be possible to check that the measured improvement compared to the Arria 10 family.
      • For a known observation DM data set sent from the PC to the FDAS FPGA via a DMA write over the PCIe.
        • Check the accuracy of the Filter Output Plane (FOP) which is generated from an observation DM by the Convolution (CONV) module and stored in external DDR SDRAM. The FOP contains 2^22^ x 85 samples each of which is a 32-bit floating point IEEE 754 value. This data can be read back by the host PC via a PCIe DMA access.
        • Check the results of the Harmonic Summing (HSUM) module to confirm it has identified all the pulsar candidates within the observation DM.
      • The power consumed by the card that the Intel Agilex FPGA is fitted to.
      Show
      In PI15 the FDAS FPGA was translated from the older Intel Arria 10 family to the new higher performance Intel Agilex F family. This translation mainly involved generating the special Intel Agilex IP blocks such as the PCIe Interface block, DDR SDRAM Controller Block and the Fourier Transform Blocks and then modifying the original FDAS circuits to integrate and connect to these new IP blocks. In PI16 the software that runs on the Host PC to allow communication over the PCIe interface to the Agilex FPGA was evaluated and tested. To perform this task the Agilex FPGA was loaded with an Intel “Reference Design” which contains the PCIe Interface Block and RAMs to store the data from configuration writes and Direct Memory Access (DMA) writes from the PC over the PCIe interface. Read-back of this configuration data and DMA data could then be performed by the PC over the PCIe Interface to confirm the data has been correctly written into the RAMs in the Intel Agilex FPGA. In this way it was possible to prove that the software running on the PC was operating correctly. In feature SP-2945 corrections to the FDAS Agilex design were successfully implemented to ensure PCIe accesses by the Host PC operated correctly. The object of this feature is to test and performance benchmark the FDAS Agilex design that uses two DDR SDRAM interfaces and compare the results with those for the older Intel Arria 10 family. This benchmarking will check the following parameters:- The time for FDAS to process a DM. With the higher internal clock speed of the Agilex family it should be possible to check that the measured improvement compared to the Arria 10 family. For a known observation DM data set sent from the PC to the FDAS FPGA via a DMA write over the PCIe. Check the accuracy of the Filter Output Plane (FOP) which is generated from an observation DM by the Convolution (CONV) module and stored in external DDR SDRAM. The FOP contains 2^22^ x 85 samples each of which is a 32-bit floating point IEEE 754 value. This data can be read back by the host PC via a PCIe DMA access. Check the results of the Harmonic Summing (HSUM) module to confirm it has identified all the pulsar candidates within the observation DM. The power consumed by the card that the Intel Agilex FPGA is fitted to.
    • Hide

      Given that:-

      • In PI15 an FDAS Agilex design using two DDR Interfaces was created as a direct translation of the Intel Arria 10 family
      • In PI16 the software to allow communication with the Intel Agilex FPGA family via the PCIe interface was evaluated and proven to work.
      • In feature SP-2945 of PI17 corrections were successfully made to the FDAS Agilex design to ensure correct operation of the PCIe accesses.

      When A DM observation is passed by the Host PC to the FDAS FPGA via the PCIe interface and the FDAS FPGA is triggered to process the DM.

      Then it shall be possible to benchmark the FDAS version using two DDR SDRAM interface in the new Intel Agilex family and compare their performance to the old Intel Arria 10 family for processing speed, calculation accuracy and power consumption. These results should indicate if the Intel Agilex FPGA family is capable of meeting the requirements for the FDAS design in the PSS function

      Show
      Given  that:- In PI15 an FDAS Agilex design using two DDR Interfaces was created as a direct translation of the Intel Arria 10 family In PI16 the software to allow communication with the Intel Agilex FPGA family via the PCIe interface was evaluated and proven to work. In feature SP-2945 of PI17 corrections were successfully made to the FDAS Agilex design to ensure correct operation of the PCIe accesses. When  A DM observation is passed by the Host PC to the FDAS FPGA via the PCIe interface and the FDAS FPGA is triggered to process the DM. Then  it shall be possible to benchmark the FDAS version using two DDR SDRAM interface in the new Intel Agilex family and compare their performance to the old Intel Arria 10 family for processing speed, calculation accuracy and power consumption. These results should indicate if the Intel Agilex FPGA family is capable of meeting the requirements for the FDAS design in the PSS function
    • 3
    • 3
    • 0
    • Team_PSS
    • Sprint 5
    • Hide

      Intel Agilex FDAS FPGA Performance

      It has been possible to measure the Processing times of the  corrected 2 DDR SDRAM version of the Intel Agilex FDAS FPGA.

       This design has been uploaded to the PSS Google Drive in folder FDAS_AGILEX_PI17_RELEASE:-

      Sub-Folder FDAS_PI17_2_DDR_BUILD contains the Quartus Prime build.

      Sub-Folder FDAS_PI17_2_DDR_REPOSITORY contains the files for the repository.

       The specification FDAS_IMPLEMENTATION_2_Draft_C_2.pdf in the FDAS_PI17_2_DDR_REPOSITORY/docs folder contains the predicted and measured DM processing times. These are summarised below:-

       

       

      8 harmonic accelerated pulsar search DM processing time:-

      FDAS Configuration Parameters for the Acceleration Search:

      • 2^22 (i.e. 4 Million samples processed)
      • 85 Filter Output Plane (FOP) Columns generated
      • Convolution module (CONV) performing 7 filter convolutions (becomes 7 +ve acceleration, 7 -ve acceleration) in 6 loops
      • One Summer instance in the Harmonic Summing (HSUM) module
      • Pulsar Fundamental searched for in 262,144 FOP Columns
      • 85 FOP Rows  from DDR SDRAM used for the Harmonic summing
      • 21 Pulsar Orbital accelerations searched in the above 262,144 FOP Columns
      • 11 Orbital acceleration Ambiguity Slopes searched

      Convolution : Agilex Measured time = 122ms, Arria 10 Measured time = 106.7 x2 = 213.4ms

      Harmonic Summing : Agilex Measured time = 254ms, Arria 10 Measured time = 336.86ms

      AGILEX TOTAL 8 HARMONIC ACCELERATED SEARCH MEASURED = 376ms

       

      12 harmonic non-accelerated pulsar search DM processing time:-

      FDAS Configuration Parameters for the Non-Acceleration Search:

      • One Summer instance in the Harmonic Summing (HSUM) module
      • Pulsar Fundamental searched for in 262,144 FOP Columns
      • 1 FOP Row from DDR SDRAM used for the Harmonic summing
      • 1 Pulsar Orbital acceleration searched in the above 262,144 FOP Columns
      • 1 Orbital acceleration Ambiguity Slope searched

      Convolution : Already performed for the 8 harmonic accelerated search

      Harmonic Summing : Agilex Measured time  136ms

       

      It should be noted that these results are for a DDR4 SDRAM clock frequency of 1200MHz as this is the maximum frequency for which DDR Controller settings are available from Intel. The intention was to run the DDR interfaces at 1333,333MHz as this is the maximum that the Agilex FPGA supports. However discussions with Intel have indicated that the 1333.333MHz will not be supported on the Intel Agilex Development Board.

      Compared to the Arria 10 for the 8 harmonic accelerated search it appears the DDR SDRAM efficiency is approx. 80% compared to 70%.  However for the 12 harmonic  non-accelerated search the DDR SDRAM efficiency drops to approx. 50%. This is probably due to the fact that for the non-acceleration search every DDR SDRAM access is to a non-contiguous address location. This was not tested in Arria 10 and so there are no comparable results.

      The results indicate that with a three DDR SDRAM Intel Agilex FDAS version it should be possible to meet the processing time of 357ms per DM for the 8 harmonic accelerated search, however extra design effort may be required to additionally achieve a 12 harmonic non-accelerated search within the 357ms limit. A three DDR SDRAM Intel Agilex FDAS version shall be tested in the next PI.

      Currently no measured Power Consumption measurements are available due to difficulty in getting the Intel Agilex Development Card to reliably report power values via the driver software.

      ===================

      Accuracy of the Filter Output Plane (FOP) which is generated from an observation DM by the Convolution (CONV) module and stored in external DDR SDRAM.

      Using data set "fake_p22.699 acc_ph0.21604", the output from CONV was compared to simulations of CONV. For a FOP size of 4832 and overlap of 420, the outputs are a perfect match.

      ===================

      Intel Agilex FDAS FPGA Extended Message Signalled Interrupt Commissioning (MSI-X)

      The Extended Message Signalled Interrupts (MSI-X) have been investigated with attempts to achieve correct operation. However it appears from an Intel website that a later version of the Intel Quartus Prime software tool is required as there is a bug in the 22.2 version of the tool that prevents MSI-X operation. Hence this work will continue in a later PI when the Intel Agilex FPGA has been re-built using a later version of the Intel Quartus Prime software.  See the following link for the description of then intel Quartus Prime software tool bug:-

      https://www.intel.com/content/www/us/en/support/programmable/articles/000089935.html

      Show
      Intel Agilex FDAS FPGA Performance It has been possible to measure the Processing times of the  corrected 2 DDR SDRAM version of the Intel Agilex FDAS FPGA.  This design has been uploaded to the PSS Google Drive in folder FDAS_AGILEX_PI17_RELEASE :- Sub-Folder FDAS_PI17_2_DDR_BUILD contains the Quartus Prime build. Sub-Folder FDAS_PI17_2_DDR_REPOSITORY contains the files for the repository.  The specification FDAS_IMPLEMENTATION_2_Draft_C_2.pdf in the  FDAS_PI17_2_DDR_REPOSITORY/docs folder contains the predicted and measured DM processing times. These are summarised below:-     8 harmonic accelerated pulsar search DM processing time:- FDAS Configuration Parameters for the Acceleration Search: 2^22 (i.e. 4 Million samples processed) 85 Filter Output Plane (FOP) Columns generated Convolution module (CONV) performing 7 filter convolutions (becomes 7 +ve acceleration, 7 -ve acceleration) in 6 loops One Summer instance in the Harmonic Summing (HSUM) module Pulsar Fundamental searched for in 262,144 FOP Columns 85 FOP Rows  from DDR SDRAM used for the Harmonic summing 21 Pulsar Orbital accelerations searched in the above 262,144 FOP Columns 11 Orbital acceleration Ambiguity Slopes searched Convolution : Agilex Measured time = 122ms, Arria 10 Measured time = 106.7 x2 = 213.4ms Harmonic Summing : Agilex Measured time = 254ms, Arria 10 Measured time = 336.86ms AGILEX TOTAL 8 HARMONIC ACCELERATED SEARCH MEASURED = 376ms   12 harmonic non-accelerated pulsar search DM processing time:- FDAS Configuration Parameters for the Non-Acceleration Search: One Summer instance in the Harmonic Summing (HSUM) module Pulsar Fundamental searched for in 262,144 FOP Columns 1 FOP Row from DDR SDRAM used for the Harmonic summing 1 Pulsar Orbital acceleration searched in the above 262,144 FOP Columns 1 Orbital acceleration Ambiguity Slope searched Convolution : Already performed for the 8 harmonic accelerated search Harmonic Summing : Agilex Measured time  136ms   It should be noted that these results are for a DDR4 SDRAM clock frequency of 1200MHz as this is the maximum frequency for which DDR Controller settings are available from Intel. The intention was to run the DDR interfaces at 1333,333MHz as this is the maximum that the Agilex FPGA supports. However discussions with Intel have indicated that the 1333.333MHz will not be supported on the Intel Agilex Development Board. Compared to the Arria 10 for the 8 harmonic accelerated search it appears the DDR SDRAM efficiency is approx. 80% compared to 70%.  However for the 12 harmonic  non-accelerated search the DDR SDRAM efficiency drops to approx. 50%. This is probably due to the fact that for the non-acceleration search every DDR SDRAM access is to a non-contiguous address location. This was not tested in Arria 10 and so there are no comparable results. The results indicate that with a three DDR SDRAM Intel Agilex FDAS version it should be possible to meet the processing time of 357ms per DM for the 8 harmonic accelerated search, however extra design effort may be required to additionally achieve a 12 harmonic non-accelerated search within the 357ms limit. A three DDR SDRAM Intel Agilex FDAS version shall be tested in the next PI. Currently no measured Power Consumption measurements are available due to difficulty in getting the Intel Agilex Development Card to reliably report power values via the driver software. =================== Accuracy of the Filter Output Plane (FOP) which is generated from an observation DM by the Convolution (CONV) module and stored in external DDR SDRAM. Using data set "fake_p22.699 acc_ph0.21604", the output from CONV was compared to simulations of CONV. For a FOP size of 4832 and overlap of 420, the outputs are a perfect match. =================== Intel Agilex FDAS FPGA Extended Message Signalled Interrupt Commissioning (MSI-X) The Extended Message Signalled Interrupts (MSI-X) have been investigated with attempts to achieve correct operation. However it appears from an Intel website that a later version of the Intel Quartus Prime software tool is required as there is a bug in the 22.2 version of the tool that prevents MSI-X operation. Hence this work will continue in a later PI when the Intel Agilex FPGA has been re-built using a later version of the Intel Quartus Prime software.  See the following link for the description of then intel Quartus Prime software tool bug:- https://www.intel.com/content/www/us/en/support/programmable/articles/000089935.html
    • 18.3
    • Stories Completed, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO
    • PI22 - UNCOVERED

    Description

      This feature involves the benchmarking  the performance of the FDAS Agilex version that uses two DDR SDRAM interfaces, with regard to processing time, operational accuracy and power consumption. The performance can then be compared to that of the FDAS design in the Arria 10 family and it can be determined if the Intel Agilex FPGA family is suitable for FDAS to meet the PSS requirements.

      Attachments

        Issue Links

          Structure

            Activity

              People

                A.Noutsos Noutsos, Aristeidis
                L.Levin-Preston Levin-Preston, Lina
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 3.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete324.0
                  Total324.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel