connectors in Talend

Posted on January 27, 2015 by satish chennupati

Talend post # 5: Connectors in Talend

we have been doing some samples now and in every sample we are using connector’s to connect our various components i.e. source components, target components and intermediary components. Its time to know about connector’s a bit more.

1. what is a connector ?

A connector is a component in Talend open studio used to connect other components in the job or sub-job logically. It define the action, the flow and depicts the coordination among the components used in the job.

2. Types of connectors.

Connectors in Talend can be broadly classified into

Row Connector
Iterate Connector
Trigger Connector
Link Connector

Row Connector: Row connection handles the actual data. The Row connections can be main, lookup, reject or output according to the nature of the flow processed.

Main: Main is the most commonly used connector. It handles data flow between components. The rows follow the same schema structure defined in the component for the input types. To connect two components using a Main connection, right-click the input component and select Row > Main on the connection list. Alternatively, you can click the component to highlight it, then right-click it and drag the cursor towards the destination component. This will automatically create a Row > Main type of connection. There is a limitation/restriction associated with the usage of main connector i.e. you cannot connect two input components using main also you cannot have two main connections flowing into the target component as well.

lookup: represented by a dashed line, lookup is used to handle data from multiple sources. It can only be used when in a job, data from more than once source is needed to be processed. To add a lookup drop a new source to the job which already had a main row connector connecting existing source to a target. Now connect this new component and it will automatically be a lookup connector. The lookup and main connector are interchangeable i.e. anytime you can simply change lookup to main (which will automatically convert the existing main to lookup). To do so, simply select the lookup connector and then right click

Filter: This row link connects specifically a tFilterRow component to an output component. This row link gathers the data matching the filtering criteria. This particular component offers also a Reject link to fetch the non-matching data flow.

Reject: This row link connects a processing component to an output component. This row link gathers the data that does NOT match the filter or are not valid for the expected output. This link allows you to track the data that could not be processed for any reason (wrong type, undefined null value, etc.).

ErrorReject: This row link connects a tMap component to an output component. This link is enabled when you clear the Die on error check box in the tMap editor and it gathers data that could not be processed (wrong type, undefined null value, unparseable dates, etc.).
Output: Output connector is same as main connector but used to process the data that is output of a tMap. This row link connects a tMap component to one or several output components. As the Job output can be multiple, you get prompted to give a name for each output row created.

Uniques/Duplicates: This connector is associated with tUniqRow component. The Uniques link gathers the rows that are found first in the incoming flow. This flow of unique data is directed to the relevant output component or else to another processing subjob. The Duplicates link gathers the possible duplicates of the first encountered rows. This reject flow is directed to the relevant output component, for analysis for example.

Iterate Connector: Iterate Connector helps developers if they wish to have a loop on DB entries, rows in a file or files in a directory etc. There are certain built-in components such as tFilesList that are associated with Iterate Connector. Only one component can be iterated at a time i.e. only one component can be the target of a iterate connector.

Trigger Connector: Trigger Connectors dont handle data but the order in which job should be executed or in other words it define the sequence of processing that happens in a job execution. Trigger connectors are mainly classified into two broad categories, they are :

subjob triggers
- on subjob ok
- on subjob error
- run if
component triggers
- on component ok
- on component error
- Run if

OnSubjobOK: when you want your subjob to be execute only when your main job completed without any errors then OnSubjobOK is used. This connection is to be used only from the start component of the Job. These connections are used to orchestrate the subjobs forming the Job or to easily troubleshoot and handle unexpected errors.
OnSubjobError: If you want to execute your subjob even if there are errors reported by your main job then OnSubjobError trigger is used. This “on error” subjob helps flagging the bottleneck or handle the error if possible.
OnComponentOK: It is used to execute target components only when source component is executed without any errors. An example would be to load a particular library first and then to perform task based on that library. unless the library is not loaded you dont want other tasks to be executed.

OnComponentError: will trigger the sub-job or component as soon as an error is encountered in the primary Job.
Run If : If you want your subjob or target component to be triggered based on certain conditions then you use Run if trigger.

Link Connector: The Link connection can only be used with ELT components. These links transfer table schema information to the ELT mapper component in order to be used in specific DB query statements. The Link connection therefore does not handle actual data but only the metadata regarding the table to be operated on. When right-clicking the ELT component to be connected, select Link > New Output.

Mongo DB operations using Talend

Posted on January 20, 2015February 5, 2015 by satish chennupati

Talend post # 4:

Talend provides strong integration capabilities with big data technologies with its built-in big data components.

Lets see how to connect to very popular noSQL database called mondoDB with Talend Open Studio.

1. Open talend open studio and verify pallet on the right hand side to see if mongoDB components are installed and available.

if not download the needed mongoDB components from Talend exchange forum :

http://www.talendforge.org/exchange/

in this example I am using tMongoDBConnection and tMongoDBOutput components.

You would also need mongo DB driver for java. (mongo-java-driver-2.12.4.jar).

Once you have all the prerequisites ready, you can proceed with job design.

2. Make sure you have your mongoDB installed and configured. (I have explained the steps below).

Once your mongoDB is up and running. Open Talend Open Studio and create a new job.

in the job design area. Drag drop the tLibraryLoad component to load mongoDB drivers. In the component panel for tLibraryLoad locate the mongoDB driver for java from your local PC.

3. Drag drop the tMongoDBConnection component into job area.

in the component panel for this component supply server details where mongoDB is running, the port on which mongoDB is running and the database name.

** since i am running mongoDB on my PC i have given localhost, i already created a database called sample and the port is the default port which mongoDB used when you install it.

4. Once tMongoDBConnection component is configured connect the tLibraryLoad component and tMongoDBConnection. (use connect type onComponentOK).

** we will talk about various type of connectors in my upcoming blogs.

5. until now we have built the environment needed to connect to mongoDB. Now we need to do the actual task. What we are going to do is to load data from a delimited flat file called books into mongoDB collection.

6. Assuming you already have books metadata. Drag and drop the metadata to job design area and configure it in the component panel.

6. once this is done. connect books component with tMongoDBConnection (use onComponentOK connector). now get a tMongoDBOutput component to insert data to mongoDB in job design area. Add a tMap Component and connect books to tMap and tMap to TmongoDBOutput component.

Once all connections are done. we need to configure our tMap.

7. double click and open the tMap component. Create the output schema by selecting the fields you want to insert to mongoDB. if needed perform any transformations but i have kept it simple in the sample.

click ok and close the editor.

8. select the component panel for tMongoDBOutput and select the check box for the option use existing connection, because we are using the existing connection that we already obtained using tMongoDBConnection component. Also provide the collection name that is going to hold the data in mongoDB.

once all the above is done. Click save and run the job using run panel.

open mongoDB and verify if the collection contain data or not.

use the following commands at the command prompt.

a. show dbs — this will list all the databases currently setup. (we used sample in tMongoDBConnection component so we should be seeing sample, otherwise we need to create one)

b. use sample — to switch to desired database

c. db.books.find() — command to list all the records in the books collection.

creating a jasper report in Talend

Posted on January 12, 2015February 5, 2015 by satish chennupati

Talend post #3:

Today let’s work on some reporting. What I am going to do today is to read a customer file and produce a report if any of the customer’s phone number start’s with a ‘+0’ and write an error message against it.

1. create a metadata for customer ( i am using a delimited file input ) and add it to the new job you are creating.

2. drag drop or key in for the component ‘tJasperOutput’. This is going to be our target component to generate jasper report. Configure the component in the component panel by selecting type as PDF.

Also navigate to the path where your JRXML file* is located as well as the destination path where your report should be created.

note: we will see what is jrxml and how to create one at the end of the this article.

3. connect the source and target. Once done we need to do some transformation to map the source fields to target fields. drag and drop an tMap component the the row that connects source and target and join them together again.

4. transformations : I am using very basic transformations in this sample. First I am going to change the case of the name from lower to upper and then i am going to transform the phone number format from ‘xxxxxxxxxx’ to ‘+(xxx) xxx xxxx’.

Double click and open the tMap component.

in the target column’s click on the edit button to open the expression builder against the name field. select appropriate functions from the list to change the case to Upper.

click ok and close the expression builder.

5. under the variables section in ‘tMap’ create a new variable and write your logic to transform the phone number format. I used the following code but you can use any function/method to make it more simpler

“+” +StringHandling.LEFT(row1.phone,1)+”(“+ row1.phone.substring(1,4)+”) “+row1.phone.substring(4,7)+” “+row1.phone.substring(7)”

once done. Connect the variable to the intended target column as shown below.

save your flow and execute by clicking the run button.

below is the generate report and source customer details (rename to .txt)

Reject Report customer

creating a JRXML to generate a jasper report.

jasperLayout.jrxml

JRXML is an xml template created using standard xml format. <jasperreport> is the root tag with numerous sub tags to define your report. This xml is fed to a jasperengine which in turns process the file and creates a report.

The template contains two main sections under the root tag. The <columnHeader> section and the <detail> section.

The <columnHeader> section is used to define the elements that are going to be in your report and the detail section indicated the values for these elements defined in <columnHeader>

below are the main tags that I have used.

<field name>: This element is used to map data from datasources or queries into report templates.

<fieldDescription>: maps the field name with the appropriate element in the XML file.

<staticText>: This defines static text that does not depend on any datasources, variables, parameters, or report expressions.

<textField>: defines a field that appears on the report.

<textFieldExpression>: This defines the appearance of result field.

$F{ErrorMessage}: This is a example of a variable that contains the value of result predefined field in the tag <field name>.

<band>: Bands contain the data that is displayed in the report.

filtering data in Talend

Posted on January 7, 2015February 5, 2015 by satish chennupati

Talend post # 2: In my previous post we saw how to create a simple data integration job using Talend Open Studio.

Let’s see how to filter the data and process only selective data.

we will see the usage of components ‘tFilterColumns’ and ‘tFilterRow’ used to filter data in different ways.

1. drag and drop the already created books metadata to the job design area.

Talend will popup relevant options on how you want to use this metadata. select tFileInputDelimited from the options as we want this metadata to represent a delimited file that is the input for this job.

2. Add the components to filter your data. I am going to add ‘tFilterRow’ first.

configure your component by supplying the filter conditions in the component panel.

select column which you want to use as filter and supply the filter condition. In my example I am processing all those books which are published in year 2015.

3. since i have the component to filter the records I want to process, now I want to process only the columns I am interested in. For this I will use tFilterColumns component. Drag drop or add component to job design area by keying it in.

connect the two components tFilterRow and tFilterColumns. You cannot simply connect them this time, as you are performing a filter operation. right click on the tFilterRow component — then select row — select filter. Now you would be able to connect your tfilterrow to tfiltercolumns.

Once you connect the two components, select tFilterColumns and navigate to component panel and click on edit schema to define your new schema/columns.

A popup window will open showing all the columns that you can select from. Here in the sample I selected only three columns; ISBN, Book_title and year. You can also create a new column which is not available at the source by clicking on the add (green +) button.

4. select a target where you want this data to be written. I in this case to put things simple using tLogRow component to write data to log.

configure how tLogRow should write content to log. I selected the option Table to display content in more readable format.

5. Test your code by clicking on the run button at the run panel.

creating a simple job using Talend open studio

Posted on January 5, 2015February 5, 2015 by satish chennupati

Talend Post# 1 : Let’s see how to create a very basic job using Talend Open Studio. ( I am using Talend Open Studio version 5.6)

Open the Talend Open Studio for Data Integration (assuming you have created a project).

1. Right click on the job designs and select ‘create job’

2. supply the details of the new job. ( you can leave the optional fields blank)

3. In the job design area, drag and drop the ‘tFileInputDelimted’ in the alternatively you can simply place the cursor anywhere in job design area and start typing and the editor will give you suggestions on various components based on what you key in.

At the bottom of the editor there will be a component panel available to configure this newly added component for input delimited file.

supply the values according to your environment, i.e path of the file, delimiter etc.

4. now we have our input configured. its time to configure our output. There are multiple ways to do this. In this example I am going to convert a csv files with many columns to a simplified XML with few columns.

so either drag and drop ‘tFileOutputXML’ component or select it form the suggestions by pressing key strokes accordingly. in the component’s panel below simply provide the path where you want this file to be created.

note that we are not creating the structure for this XML file at this point of time.

5. configuring output schema/mapping: before you go and configure your output schema. connect the two components. simply highlight the input component by clicking on it and then right click the mouse and drag towards the output component and drop it (release mouse ) at the target component.

Once both the components are connected drop a tMap component on the connector. double click and open the tMap component. You will see all the fields/columns from the input. For the output component you can define columns using the add button available at the bottom.

once all desired output columns are defines you can either drag and drop the source columns to destination columns to map them or select auto map option (auto map works only if both the source and target column names are identical).

click apply and ok once done.

6. execution: at the bottom there will be a run panel. save your job and click on the run button.

once job execution is successful you can see the statistics of records processed by your job.

iWay Data Migrator vs Talend open studio

Posted on January 3, 2015 by satish chennupati

post # 2: Yes, you read it correct. This is numbered post # 2 because it was not intended to by my first post. Moving to subject

As being in IT industry as a developer for more then 8 years and possessing multi-skill set, experience of using various tools for software development, ETL processes, made me learn only one thing all these tools provide almost the same features what differs is the usage.

Today I am going to compare two Data Integration tools; Data Migrator from IBI and Talend open studio from Talend. I have working experience on data migrator whereas I am just playing around Talend open studio.

Programming style: Though both are drag and drop tool we still sometime needs to write some code, if you are using Data Migrator then you should be well aware of the PL/SQL programming language/style as WEB-FOCUS (the language used in Data Migrator for coding) is more inclined towards PL/SQL style of programming whereas Talend is Java based. To use Talend open studio one should be well aware of programming with Java.

Editors: IBI has a custom built dedicated editor for Data migrator called the DMC (Data Migrator Console). Talend Open studio on the other hand is a eclipse based editor. Both editors are pretty handy to use with all much needed features easily accessible at a single click of mouse. But when the actual work begins the DMC is much easier to use as developing a job is pretty much straight forward in here, you get good help on all the built-in functions as well as examples on how to use within the editor. Talend open studio on the other hand is little tricky, especially for the first time users. You need to scratch your head a bit to understanding creating transformation, connecting your source and targets to a tMap, however once you are familiar with the options its just a game from there on.

Execution: executing and testing your code is extremely important and both iWay Data Migrator and Talend Open Studio make it extremely simple to execute your code. iWay Data Migrator provides a SET component wherein you can initialize your runtime variables with test data. However there is one drawback with this. If after testing the SET component is not removed from the code then it eventually get promoted to your other environments and if the runtime variables are initialized test data with ‘SET’ property rather then ‘Default’ property then the actual runtime values will be overwritten by the values in SET component (only if the actual assignment to variables is happening before the SET component is invoked). In Talend on the other hand there is a provision to define a context. This is very useful feature. You can define various global variables (DB connection string, usernames, passwords, etc) pertaining to various contexts and when you want to test a piece of code that you wrote then you have an option to select the context in which you want to execute it to examine desire behavior.

Adapters : Though iWay Data Migrator has some powerful adapters to connect to various kind of software interfaces to integrate heterogeneous software systems, it is Talend Open Studio that takes the lead with comes to support for adapters. It has an adapter for almost every technology existing today starting from edifact to SAP, basic delimited files to json documents, traditional rdbms to vast noSql databases like couchDB and HBase.

	satish chennupati on creating a jasper report in Ta…
	Pedro Jimenez on creating a jasper report in Ta…
	satish chennupati on creating a jasper report in Ta…
	Pedro Jimenez on creating a jasper report in Ta…
	Umesh on Mongo DB operations using Tale…