Talend post # 5: Connectors in Talend
we have been doing some samples now and in every sample we are using connector’s to connect our various components i.e. source components, target components and intermediary components. Its time to know about connector’s a bit more.
1. what is a connector ?
A connector is a component in Talend open studio used to connect other components in the job or sub-job logically. It define the action, the flow and depicts the coordination among the components used in the job.
2. Types of connectors.
Connectors in Talend can be broadly classified into
- Row Connector
- Iterate Connector
- Trigger Connector
- Link Connector
Row Connector: Row connection handles the actual data. The Row connections can be main, lookup, reject or output according to the nature of the flow processed.
- Main: Main is the most commonly used connector. It handles data flow between components. The rows follow the same schema structure defined in the component for the input types. To connect two components using a Main connection, right-click the input component and select Row > Main on the connection list. Alternatively, you can click the component to highlight it, then right-click it and drag the cursor towards the destination component. This will automatically create a Row > Main type of connection. There is a limitation/restriction associated with the usage of main connector i.e. you cannot connect two input components using main also you cannot have two main connections flowing into the target component as well.
- lookup: represented by a dashed line, lookup is used to handle data from multiple sources. It can only be used when in a job, data from more than once source is needed to be processed. To add a lookup drop a new source to the job which already had a main row connector connecting existing source to a target. Now connect this new component and it will automatically be a lookup connector. The lookup and main connector are interchangeable i.e. anytime you can simply change lookup to main (which will automatically convert the existing main to lookup). To do so, simply select the lookup connector and then right click
- Filter: This row link connects specifically a tFilterRow component to an output component. This row link gathers the data matching the filtering criteria. This particular component offers also a Reject link to fetch the non-matching data flow.
- Reject: This row link connects a processing component to an output component. This row link gathers the data that does NOT match the filter or are not valid for the expected output. This link allows you to track the data that could not be processed for any reason (wrong type, undefined null value, etc.).
- ErrorReject: This row link connects a tMap component to an output component. This link is enabled when you clear the Die on error check box in the tMap editor and it gathers data that could not be processed (wrong type, undefined null value, unparseable dates, etc.).
- Output: Output connector is same as main connector but used to process the data that is output of a tMap. This row link connects a tMap component to one or several output components. As the Job output can be multiple, you get prompted to give a name for each output row created.
- Uniques/Duplicates: This connector is associated with tUniqRow component. The Uniques link gathers the rows that are found first in the incoming flow. This flow of unique data is directed to the relevant output component or else to another processing subjob. The Duplicates link gathers the possible duplicates of the first encountered rows. This reject flow is directed to the relevant output component, for analysis for example.
Iterate Connector: Iterate Connector helps developers if they wish to have a loop on DB entries, rows in a file or files in a directory etc. There are certain built-in components such as tFilesList that are associated with Iterate Connector. Only one component can be iterated at a time i.e. only one component can be the target of a iterate connector.
Trigger Connector: Trigger Connectors dont handle data but the order in which job should be executed or in other words it define the sequence of processing that happens in a job execution. Trigger connectors are mainly classified into two broad categories, they are :
- subjob triggers
- on subjob ok
- on subjob error
- run if
- component triggers
- on component ok
- on component error
- Run if
- OnSubjobOK: when you want your subjob to be execute only when your main job completed without any errors then OnSubjobOK is used. This connection is to be used only from the start component of the Job. These connections are used to orchestrate the subjobs forming the Job or to easily troubleshoot and handle unexpected errors.
- OnSubjobError: If you want to execute your subjob even if there are errors reported by your main job then OnSubjobError trigger is used. This “on error” subjob helps flagging the bottleneck or handle the error if possible.
- OnComponentOK: It is used to execute target components only when source component is executed without any errors. An example would be to load a particular library first and then to perform task based on that library. unless the library is not loaded you dont want other tasks to be executed.
- OnComponentError: will trigger the sub-job or component as soon as an error is encountered in the primary Job.
- Run If : If you want your subjob or target component to be triggered based on certain conditions then you use Run if trigger.
Link Connector: The Link connection can only be used with ELT components. These links transfer table schema information to the ELT mapper component in order to be used in specific DB query statements. The Link connection therefore does not handle actual data but only the metadata regarding the table to be operated on. When right-clicking the ELT component to be connected, select Link > New Output.