17
Euclid Consortium Slide 1 The Role of the EUCLID Archive System in Distributed Data Management & Processing Rees Williams on behalf of A.N.Belikov, K. Begeman, D.Boxhoorn, B. Dröge, E. Helmich, J.McFarland, T. Nutma, A.Tsyganov, E.A. Valentijn, W-J Vriend University of Groningen, Groningen, The Netherlands B.Altieri, G.Buenadicha, J. Hoar, S.Nieto, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain Martin Melchior (IAL Data Model) University of Applied Science, Brugg

The Role of the EUCLID Archive System in Distributed Data

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Euclid Consortium

Slide 1

The Role of the EUCLID Archive System in Distributed Data Management & Processing

Rees Williams on behalf of

A.N.Belikov, K. Begeman, D.Boxhoorn, B. Dröge, E. Helmich, J.McFarland, T. Nutma, A.Tsyganov, E.A. Valentijn, W-J Vriend

University of Groningen, Groningen, The Netherlands

B.Altieri, G.Buenadicha, J. Hoar, S.Nieto, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain

Martin Melchior (IAL Data Model)

University of Applied Science, Brugg

Euclid Consortium

Slide 2

The Euclid Mission Archive (EAS) has a central role in the Euclid mission • EAS acts as interface between all ground system components • EAS data distributed across nine national SDCs • EAS meta-data is centrally stored • EAS metadata contains all information except images and spectra • EAS stores dependencies of data products

o Avoid unnecessary reprocessing o Data provenance ensured

• Builds on experience with the WISE information system used by

o OmegaCAM (VST), Muse (VLT), LOFAR LTA

• Builds on experience with earlier ESA archives • Traceability and data lineage are strong requirements

Euclid Consortium

Slide 3

EAS-DPS Design

Euclid Consortium

Slide 4

EAS Design

DPS-MDR: Metadata Repository DPS-MAL: Metadata Access Layer DPS-MIS: Metadata Ingestion Service DPS-CUS: Consortium User Services DPS-CPS: Consortium Processing Services DPS-MRS: Metadata Replication Service SAS-MTS: Metadata Transfer Service SAS-AUS: Archive User Services COORS: Coordination & Orchestration Syst. IAL: Infrastructure Abstraction Layer HPCI: High Performance Computing I. DSS-SVR: Distributed Storage System Syst. DSS-SNI: Storage Node Interface CDM: Euclid Common Data Model SEDM: Science Exploitation Data Model

Scien

ce Com

mu

nity

Advanced Application Services

Consortium User Services

SEDM

SAS-MDR

DPS-MDR

DPS MAL

DPS CUS DPS CPS

SAS MAL

SAS MTS

DPS-MRS

EC Users

Metadata transfer

Scientific access

EAS-DPS@SDC-NL/Slave

EAS-DPS @ESAC/Master

EAS-SAS@ESAC

DPS MIS

SA

S-A

US

CLI

WEB

GU

I DPS-MDR

DPS MAL

DPS CUS

DPS CPS DPS MIS

DSS SVR DSS SNI HPCI

FS

DSS SVR

DSS SNI

FS

DSS SVR

DSS SVR DSS SNI

IAL

HPCI

FS

SDC 1 SDC n SOC ESAC

Notify SDCs

COORS

IAL

Ingest Metadata

Ingestion Job specification

Job spec.

Query/Retrieval Data (from any

DSS SVR) Query/Retrieval Data

SD

C C

omm

un

ity

CDM

CDM

Euclid Consortium

Slide 5

• DSS server installed at each SDC provides

access to data at that SDC to other SDCs

• File location in metadata provides a global “file system”

• No constraint on storage solutions at SDCs • DSS server supports the access of data in

SDCs using different protocols o sftp, https, gridftp, Astro-Wise

dataserver, Irods

• Python common code with modules to support local file systems

• Simple interface: store, retrieve, copy, delete,

check • Extended interface: cut-out services

Data Processing System (EAS-DPS)

Science Archive System (EAS-SAS)

Distributed Storage System (EAS-DSS)

Data files storage

Computing facilities

SOC

Data files storage

Computing facilities

SDC-FR

Data files storage

Computing facilities

SDC-ES

Data files storage

Computing facilities

SDC-NL

Data files storage

Computing facilities

SDC-DE

Data files storage

Computing facilities

SDC-UK

Data files storage

Computing facilities

SDC-IT

Data files storage

Computing facilities

SDC-CH

Data files storage

Computing facilities

SDC-FI

DSS Servers

Euclid Consortium

Slide 6

• Pipeline Processing Order (PPO) • Processing plan • SGS infrastructure • IAL processing defintions

Euclid Common Data Model • Data model in XSD, objects in XML • Common thesaurus (dictionary) • EAS is the only interface for data exchange in SGS,

common data model is the only model for this interface

• VIS, NIR, NISP, EXT, MER data products • Metadata exchanged between pipelines • Input for scientific data release

• Quality flags • Configuration parameters • Common definitions and templates

Euclid Consortium

Slide 7

Euclid Common Data Model Implementation

Original Euclid Data Model (XSD)

Python classes

Database schema User Interface and services

Detailed ECDM objects

implemented in full in the EAS

Provides lineage & traceability

Euclid Consortium

Slide 8

Processing and EAS

• Survey Plan → Processing Plans → PPOs

• PPO allocated to a SDC • EAS keeps all information on

each stage of processing • EAS supports re-processing

granularity at PPO level • Back- and forward- lineage for

PPOs and data products

Euclid Consortium

Slide 9

Metadata/data flow during processing

• SGS components interaction via EAS-DPS (services CPS and MIS) • Data files exchange via EAS-DSS • Processing triggering via publishing PPO from COORS to EAS-DPS (retrieved by IAL)

Euclid Consortium

Slide 10

Metadata in user services

Euclid Consortium

Slide 11

Dbview and processing information

• Order to execute pipeline • Contains pipeline name and path to executable • Contains destination (SDC) • Contains status of execution and timestamps for

begin/end of execution

• Info on pipeline run (actual running of pipeline)

• Reference to input and output data • List of completed pipeline steps (tasks)

• Pipeline Task – minimum part of pipeline which can be run in isolation

• Input/output data for task • Id of pipeline task in pipeline

• For each produced data product → Pipeline Run Id → PPO Id→ Input data products • For each input raw frames → PPO list → Pipeline Run list → Output data products •

Euclid Consortium

Slide 12

Dbview: Find PPOs

Euclid Consortium

Slide 13

Dbview: Find Pipeline Run for PPO

Pipeline tasks

PPO Id

Euclid Consortium

Slide 14

Dbview: Find tasks for Pipeline Run

Pipeline Run Id

Task log

Euclid Consortium

Slide 15

Quality View

Euclid Consortium

Slide 16

Calibration Service

• Astro-WISE example • Change any calibration

frame/parameter • Set default

Euclid Consortium

Slide 17

• Third component of the EAS

• Data and metadata for public release is copied to the EAS Science Analysis System (SAS) by the Metadata Transfer Service (SAS-MTS)

• EAS Science Analysis System contains a reduced set of metadata o Optimised for Science Analysis o Uses the Euclid Science Exploitation Data Model

• See poster in hall on the EAS Science Analysis System for more information

EAS Science Analysis System