Monday, April 8, 2013

Amazon, Jaspersoft, and the Future of Cloud Computing


Last month, Jaspersoft announced the industry’s first completely pay-as-you-go reporting and analytic service on Amazon’s AWS Marketplace.  With this service, you can literally be up-and-running (analyzing your data) in less than 10 minutes and pay as little
as 52 cents per hour to do so.  And, as we’ve just announced, Amazon and Jaspersoft added more than 100 customers during the first month of availability – a great start to a new service destined to change the way BI is consumed for many purposes.

One of my favorite University professors recently asked me what worries me the most about being on the cutting edge with Amazon and this new service.  My response:  NOT being on the cutting edge with Amazon and this new service.  In other words, I would worry most about not innovating in this way.  Disrupting through both our business model and product innovation is a critical part of our culture at Jaspersoft.

In fact, the early success of our new Amazon-hosted service reminded me of two fast-emerging, inter-related cloud computing concepts that, though not discussed sufficiently, will have a substantial impact on the future usage and adoption of cloud-based computing services. These two concepts are: cloud-originated data and the post-transactional cloud *1.  I maintain that, as the former quickly grows, the latter becomes commonplace.

Cloud-Originated Data
While the total digital universe currently weighs in at nearly 3 zettabytes, it is estimated that more than one Exabyte of that data is stored in the cloud.  Each day, the growth rate of cloud-originated data increases, because of the explosion in services and applications that rely on the cloud as infrastructure.  So, a disproportionate amount of the 13X growth projected in the digital universe between now and 2020 will come from cloud-originated data. IDC estimates that by 2020, nearly 40% of all information will be “touched” by cloud computing (somewhere from origination to disposal).  Eventually, most of the digital universe will be cloud-based.

The growth in Amazon’s Simple Storage Service (S3) provides another compelling data point for the growth of cloud-originated data. In the past several years, Amazon’s S3 service has seen meteoric growth, now storing nearly one trillion objects (growing by 1 billion objects per day) and handling more than 650,000 requests per second (for those objects). The chart below illustrates this dramatic growth *2.



Importantly, cloud-originated data is more easily liberated (post-transaction) by other cloud services, which can unlock additional value easily and affordably.  According to a recent report by Nucleus Research, companies that more quickly utilize cloud-based analytics are likely to gain a competitive advantage:

“As companies learn to take full advantage of the analytics functionalities that are now available with utility and subscription-based pricing options, they will continue to become more able to take advantage of market trends and opportunities before their peers and take advantage of the average return of $10.66 for every dollar spent in analytics.”

Ultimately, analytics is just one of many important post-transactional uses of cloud-based
data, which will surely be the subject of future posts.

Post-Transactional Cloud
My working definition of the post-transactional cloud is “the next-generation of cloud services, beyond Software-as-a-Service (SaaS), designed to enable platform and middleware tools to use cloud-originated transactional data and deliver a richer, more sophisticated computing experience.”

The concept of a post-transactional cloud provides a powerful analog that mirrors the history of the on-premises computing world. Let me elaborate.

The ERP/CRM/Supply Chain application boom of the ‘80s and ‘90s preceded an enormous need in the ‘90s and ‘00s for additional tools and software systems designed specifically to create even greater value from the data generated by these (on-premises) transactional applications. Then, tools for data management, data integration, data warehousing and business intelligence (reporting and analytics) were born to deliver this new value.

Cloud computing has grown substantially in the last 10 years largely because of applications hosted in the cloud and made available as a service directly to consumers and businesses.  The poster
child here is Salesforce.com (although there are thousands of others).  Note that we call this category “Software-as-a-Service” when it really should be called “Application-as-a-Service” because the
providers in this category are delivering a transactional, process-oriented application designed to automate and improve some functional aspect of an organization.  As the use of these managed services/applications grows, so too does the quotient of cloud-originated data generated by these applications.

The dramatic rise in cloud-originated data from SaaS applications portends a similar need: this one for post-transactional cloud-based tools and software systems to define a new usage curve for liberating cloud-based data and creating substantially new organizational value. It’s just a matter of time. Which makes Jaspersoft’s work with Amazon clear and understandable.

In fact, Jaspersoft’s cloud-based service (across all major Platform-as-a-Service environments, such as VMWare’s CloudFoundry and Red Hat’s OpenShift, but right now, especially with Amazon’s AWS) helps ensure our tools are the de facto standard for reporting and analysis on cloud-originated data (in the post-transactional cloud). We’ll do this in two ways:
1. By bringing our BI service to customers who already prefer to use cloud services, and by being available in their preferred cloud instead of forcing them into our cloud; and
2.  By enabling elegant, affordable, embeddable reporting and analysis within cloud-based applications, so those who deliver this software can include intelligence inside their transactional applications.
At Jaspersoft, we ultimately see our cloud-based service as vital to reaching the broadest possible audience with just the right amount of reporting and analytics (not too much, not too little).  The post-transactional cloud will be fueled by cloud-originated data and the need to deliver cleverly-designed intelligence inside this environment will be more important than ever.

Brian
Gentile
CEO, Jaspersoft


1 I’ve borrowed the term “Post-Transactional Cloud” from ZDNet’s Andrew Brust, in his article entitled “Amazon Announces ‘Redshift” cloud data warehouse, with Jaspersoft support”.
2 Data and chart excerpted from TechCrunch article “Amazon S3: 905 Billion Objects Stored, 1 Billion Added Each Day”, Sarah Perez, April 6, 2012.


Tuesday, November 13, 2012

The Intelligence Inside: Jaspersoft 5

For more than three years, Jaspersoft has envisioned the capabilities we’ve just announced in our v5 platform. Because we’ve always intentionally constrained ourselves by exclusively delivering client (end-user) reporting and analysis functionality inside the web browser, our quest for v5 took longer than we would have wanted. But, we believe that the strengths and advantages of maintaining our simple, pure, web-server-based approach to advanced business intelligence is superior to relying on desktop-specific code or even browser plug-ins, which must be installed and maintained on every computer, preventing the scale and cost advantages Jaspersoft can offer.

So the interface techniques and features we deliver are constrained based on key web client technologies, especially HTML. The trade-offs we’ve lived with in the past, though, are now essentially eliminated, as a new generation of HTML5 ushers in the consistent, advanced visualization and interaction we’ve long-wanted, while allowing us to maintain our pure web-based client delivery model. Satisfaction. Jaspersoft 5 is more than a new pretty face. We have delivered a completely new HTML5 visualization engine that allows a new-level of rich graphics and interaction, but we’re also providing a host of new and more advanced back-end services that make Jaspersoft 5 more surely the intelligence inside apps and business processes. In total, Jaspersoft 5 includes six major new features.

1. Data Exploration 
To enable everyone to become a more capable analyst, the Jaspersoft 5 platform includes stunning HTML5 charts, a new dimensional zoom tool (for exploring data at more or less levels of detail), and the ability to simply change or customize charts and tables to suit a particular type of thought or analysis.

2. Data Virtualization 
Some reporting and analysis applications are best delivered without moving or aggregating data. Instead, the query engine should virtualize those data views and enable reports, dashboards and analytic views to include data from all necessary sources. Jaspersoft 5 includes an advanced data virtualization engine so that building advanced analysis using practically any data source is straightforward, including Big Data sources.

3. Columnar In-Memory Engine 
The JasperReports server has supported in-memory operations for several years. Jaspersoft 5 takes this to a new level with improved performance, features, and now with support for up to a full Terabyte of in-memory data. This means that billions of rows of data can be explored at memory speeds with our new Server.

4. Enhanced Analytics 
To give the power user analyst another reason to use Jaspersoft, we’re now including greater analytic performance, new analytic features (e.g., conditional formatting, relative date filtering, and cross-tab sorting), consistently rich visualization (see #1 above) and broadened access to multi-dimensional data sources. By supporting the latest XML/A standard, we gain certified access to Microsoft SQL Analysis Services (MSAS) data sources in addition to the traditional Mondrian. More power and greater choice equals greater usage.

5. Improved Administration and Monitoring 
To make the lives easier of those who administer and manage a JasperReports Server, we’re now using our own tool to make our Server smarter and simpler. We’ve designed a set of best-practice, interactive reports that display system health and report on the most important elements of usage. Then, we streamlined the installation and upgrade process, so that getting started and staying up-to-date has never been easier. Together, these improvements are good for our customers and our technical team who supports them.

6. PHP Support 
Scripting tools are now the most popular for web application development. The PHP community needs more advanced reporting and analysis tools to make their applications more data-driven. By extending the JasperReports Server API to now include PHP support (via RESTful web service wrappers), we’ve taken an important first step toward supporting this fast-growing world beyond Java. Welcome to Jaspersoft.

Jaspersoft 5 is poised to deliver self-service BI to help many more users answer their own questions, not just because of the beautiful new HTML5 graphing and interaction engine, but because it is designed to be highly embeddable (into apps and business processes) and, maybe most importantly, because it scales so powerfully and affordably. Putting reporting and analytics into the hands of far more users requires this fundamental reset of the BI formula. This is Jaspersoft 5.

I invite you to learn more about Jaspersoft 5 here. And, I look forward to your comments and questions. 

Brian Gentile 
Chief Executive Officer 
Jaspersoft

Friday, July 27, 2012

Big Data: Approaches, Myths & Skills


Last month, my 18-year old daughter asked me about Big Data. This is my first sure sign that a technology has reached a fever pitch in the hype cycle.  Ironically, I found that as I explained this
enterprise IT topic to my daughter, our conversation and the questions she asked did not vary greatly from many conversations I’ve had with other CEOs, journalists, financial analysts and industry colleagues.  Despite how widely Big Data is being covered these days, it appears to me that Big Data is a big mystery to many.


Trying not to be labeled a cynic, I have three big worries about Big Data:

1. My biggest worry is the poor percentage of successful Big Data projects that will emerge as we too quickly throw these new technologies at a wide variety of prospective projects in the enterprise
2. The low success rate of Big Data projects will be amplified by the current hype and subsequent misconceptions about Big Data technologies, and
3. This low project success rate could stay challenged over time because of the relative dearth of
knowledgeable, data-savvy technology and business professionals ready for a world where data are plentiful and analytic skills are not.

Successful Big Data Projects

As organizations race to evaluate and pilot Big Data tools and technologies, in search of an answer to a Big Data opportunity, I’ve seen evidence that architectural steps are being skipped in favor of speed.  Sometimes, speed is good.  In the case of Big Data, building the right data and platform architecture is critical to actually solving the business problem, which means the right amount of thoughtful planning should occur in advance.  Many missteps could be avoided by simply being clear up-front on the business problem (or opportunity) to be solved and how quickly the data must be used to enable a solution (i.e., how much latency is acceptable?).

Recently, I’ve tried to do my part to help explain successful Big Data (technical) architectures by starting with three simple, latency-driven approaches.  The specifics, including an architectural diagram, are described in my recent E-Commerce Times article, entitled “Match the Big Data Job to the Big Data Solution.” We’ve also posted additional graphics and explanation to the Big Data section of the Jaspersoft website.

Big Data Misconceptions (or Myths)
To reduce the hype, first we must overcome the misconceptions. My many conversations on the topic of Big Data yield equally many misconceptions and misunderstanding. Some examples of the most common myths: Big Data is all unstructured, Big Data means Hadoop and Big Data is just for sentiment analysis. Or course, each of these myths is only partially true and requires a deeper understanding of the technologies and their potential uses to gain real clarity.


I’ve recently offered a brief article that seeks to dispel the “Top 5 Myths About Big Data.” Published last month on Mashable. The article has garnered some great comments with the most completewritten by IBM’s James Kobielus. James improves and amplifies several of my major points. I hope you’ll join the conversation.

Analytic Skills Shortage
Worldwide digital content will grow 48% in 2012 (according to IDC), reaching 2.7 zettabytes by the end of the year.   As a result, big data expertise is fast-becoming the “must-have” expertise in every organization.  At the same time, in its 2011 research report, titled “Big data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey offered the following grim statistic:

“By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

Without the solid analytic skills needed to support a growing array of Big Data projects, the risk potential grows rapidly.  Anyone in or near data science should take the coming skills shortage as a call-to-arms.  Every college and university should be building data analytics coursework into compulsory classes across a wide variety of disciplines and subject areas. Because of its importance, I’ll save this Big Data skills topic as the thesis for a future post.

Despite these primary worries, I remain hopeful (even energized) by the enormous Big Data opportunity ahead of us.  My hope is that, armed with good information and good technology, more Big Data customers and projects will become more quickly successful.

Brian Gentile
Chief Executive Officer
Jaspersoft Corporation

Wednesday, March 21, 2012

Cloud BI Progress & Pitfalls

In my on-going effort to uncover and discuss key BI industry trends, I recently authored a new article for my TDWI column (called “The BI Revolution”), under the same headline as this post. In that article, I focused on the big market that will emerge for BI in the cloud. Even more importantly, I shed light on the definitional and technological pitfalls that are confusing this market as it seeks to deliver more efficient cloud-based business intelligence.

Rather than address my main points here, I encourage you to read my post at the TDWI website and then add your comments and thoughts here.

Cloud BI = BI for SaaS + BI for PaaS
I note that the cloud as a transformational infrastructure will drive big use of BI for SaaS (on-demand analytical applications) and BI for PaaS (application development and deployment in the cloud). I am less bullish on SaaS BI (on-demand, general-purpose BI in the cloud) because I believe growth will continue to be fueled by BI embedded in data-driven applications, rather than delivered in any standalone use.

We’re constantly tuning the Jaspersoft website on this topic, building out content that seeks to explain, educate and amplify the technological and business benefits of BI in the Cloud. One important point left out of my TDWI post describes Jaspersoft’s focus on and success in BI for PaaS (platform-as-a-service).

Recently, Jaspersoft has been very active in BI for PaaS. We are working with all the major PaaS providers to ensure our BI platform is available within these new cloud-based development and deployment environments. Just last month, Jaspersoft announced an important partnership with Red Hat, making our BI server available immediately in the OpenShift (public cloud) and CloudForms (private cloud) environments. Then, Jaspersoft produced a blog post and video to highlight its support of VMWare’s CloudFoundry PaaS environment, with a more formal announcement pending. Overall, our head of Product & Alliances summed it up best:

“Jaspersoft’s intention is to be the de facto standard in BI for PaaS, enabling the broadest community of software developers to use our tools in their favorite cloud environment,” said Karl Van den Bergh, Vice President of Product & Alliances at Jaspersoft. “We are uniquely positioned to capitalize on this shift of application development to the cloud with our modern architecture, the world’s largest BI community building data-driven applications, and our open source model.”

Through my recent TDWI article and this post, my goal is to clarify the cloudy definitions around Cloud BI, the important pitfalls already witnessed, and the progress we can point to as a sense of optimism for what will be a bright Cloud BI future.

Brian Gentile
Chief Executive Officer
Jaspersoft

Thursday, March 1, 2012

Got Big Data?

If competing based on time and information really will drive the next major economic era, then Big Data is real and represents a huge opportunity. If you’re a business analyst or technologist responsible for mapping data to decisions, then the variety, velocity, and volume of data available to you today has never been richer. And, your responsibility has never been greater.

I’ve previously discussed the different classes of data source technologies that can legitimately be used to harness (or tame) big data. Hadoop is one of those technologies, as the most popular software framework associated with this rising trend. Others include NoSQL databases, MPP data stores and even ETL/Data Integration approaches (for moving Big Data by the batch into some more usable format). Each of these technologies align with an appropriate use-case that makes more understandable the variety of products emerging in this world of Big Data.

For simplicity, I like to talk about three popular approaches to connecting to and making use of Big Data for business intelligence reporting and analysis.

Interactive Exploration – the most dynamic because it involves native connectivity directly from the BI tool to the Big Data source and can offer results in near-real-time. Hadoop HBase, Hadoop HDFS, and MongoDB are just three of the most popular data sources to which direct connection would be an advantage.

Direct Batch Reporting – an important and mainstream approach (especially in this early market of Big Data) that relies on tried-and-true SQL access to Big Data. Hadoop Hive is the best known example, but Cassandra offers CQL access that delivers similar results and functionality.

Batch ETL – using extract, transform and load techniques to create a more usable subset of the Big Data is also popular, especially when the insight being sought is less urgent, probably in the order of hours or days after data capture. Most every ETL tool has now been improved to connect to and transform Big Data. Some even integrate nicely with underlying Hadoop technologies (like Pig), making the data steward’s life potentially simpler.

Sometime last year, it occurred to me that Jaspersoft is in a unique position with regard to Big Data. Because of Jaspersoft’s data-agnostic architecture, we’ve quickly offered a broad variety of native Big Data connectors, many of which have been available for more than one year (for free download) . . . and because of our large, growing community of developers (we have more than 260,000 registered community members, growing at about 6,000/month at the time of this writing), we have important data about Big Data. This realization led us to the Big Data Index.

Big Data Index

We’ve tracked the downloads of our Big Data connectors over the last year, charting the ups and downs with each, corresponding to the relative rise and fall of their popularity. Over this time, we’ve seen more than 15,000 downloads, so our view is pretty good. Here’s a static version of the latest data for the four most popular Big Data connector downloads:



During the course of the past year, the Hadoop technologies (HBase & Hive combined) proved the most popular. The fastest growing and the leader at the moment is MongoDB (from 10gen). Cassandra holds a solid and consistent fourth position (which should benefit DataStax, the commercial company behind Cassandra). Many other Big Data connectors are tracked as well, with a dynamic chart updated monthly.

As interest in Big Data grows, so will the potential uses for these technologies that are designed to map this data to decisions and insights. At the moment, I’m just content knowing I have a front-row seat via the Big Data Index.

We’re at the very beginning of this era, which will surely be reliant on more data than we could barely fathom just ten years ago. This is why your thoughts and comments on this topic are appreciated.

Brian Gentile
Chief Executive Officer
Jaspersoft

Thursday, January 26, 2012

The New Factors of Production and The Rise of Data-Driven Applications

For the last ten years, I’ve been partially obsessed with the notion that the formula for creating economic value needs to be updated. I’ve worked in the technology industry for 26 years and I’ve seen information systems radically change the landscape of competition and value creation. My most recent article on this topic appears in Forbes under the same title as this post.

Because this article represents just a fraction of my thoughts on this matter, I’d like to revisit the basic premise, which is captured in the excerpt below, and then describe how some of my current experiences at Jaspersoft corroborate this newly posited IT-driven economic theory.

“Classical economic theory describes three primary factors, or inputs, to the production of any good or service: land, labor, and capital. These factors facilitate production, but do not become part of the end product (as a raw material would). While these three factors have been much discussed and extended at different points in economic evolution, I believe that they, in any of the advanced economies of the world today, are vastly antiquated.

Sometime even prior to this new millennium, the primary factors of production have now assuredly become: Time, Information and Capital. I submit that the primary relevance of land and labor has diminished, not completely but measurably, from their prominence during agrarian and industrial economic times. In a sense, owning land and employing lots of people no longer highly correlate to a valuable and successful enterprise. Although in certain industries these two factors will remain prominent (think mining and energy production, for example). By and large, land and labor have yielded to two more important factors – time and information.”

I was very pleased when Silicon Angle asked to speak with me about my background and the thoughts that led to this newly posited IT-driven economic theory as well as the contributions Jaspersoft is making to this new economic landscape. I discussed how Jaspersoft’s mission is precisely to help its customers compete on the basis of time and information.

“From its very start, Jaspersoft was determined to build and advance the industry’s most modern, flexible, and scalable Business Intelligence (BI) software. To do this, we consciously chose the open source model of development and distribution, believing that the power and principles of community involvement and broad usage would prove continually more valuable (and it has). We knew time would be important to our business model to rapidly compete in a crowded software category.”

Jaspersoft focuses on delivering its modern BI software to those who are best suited to create value from it. We call these individuals “BI Builders” because they possess a powerful confluence of knowledge about data, analytics, and business (process, function, industry, etc.) that truly yields new value from insight. The result is thousands of commercially available software applications that include Jaspersoft technology. These software applications power the world and deliver faster, more effective insight into data. Jaspersoft’s open source model affords these applications very high quality reporting and analytic capabilities at a very low cost, so our customers create new economic value, arguably, where it could not have been created in the past.

In many ways, the BI Builder is the real hero in the equation that determines how companies can compete more effectively based on time and information. Jaspersoft simply becomes their partner and enabler.

Here’s a chance to continue this dialog at the intersection of economic theory and information technology. I offer an open invitation for comments. Your thoughts are appreciated.

Brian Gentile
Chief Executive Officer
Jaspersoft

Tuesday, November 8, 2011

Making Sense of it All

I’ve been writing about how important it is to build and deliver big data projects that can succeed, because the opportunity to do so has never been better and the business reasons to do so have never been more compelling. Seems like each week, more tools and products are available to make big, complex data types useful for a variety of business purposes.


But, what about the unforgiving worlds of natural language and semi-structured data sources? Is there any hope to generate insight from them, even in this new big data world?


It’s one thing to make sense of more traditionally structured big data sources; its quite another to parse natural language and complex, industry-specific data types. To quickly understand the difficulties of these data environments, I recommend Brett Sheppard’s excellent blog post on this topic.


Informatica’s HParser to the Rescue

Enter Informatica’s HParser, announced last week. Now, accessing and then making sense of practically any data type has just become far simpler. You can learn more about this important new Informatica product here. HParser is a parsing technology that can run inside a MapReduce job and which allows users to structure the unstructured or semi-structured data in Hadoop and ready it for analysis. This takes a lot of the complexity out of creating custom scripts, which is what developers need to do today. HParser is available in both a community and commercial edition and features a visual development environment that, when combined with its myriad out-of-the-box parsers for semi-structured industry standard data, can eliminate up to 80% of the time it takes to turn this data into insight.


Integration with Jaspersoft

I’m thrilled that Jaspersoft has collaborated with Informatica to deliver rich reporting and analysis of natural language and semi-structured data, working directly with Informatica’s new HParser. Through integration with Jaspersoft’s BI server, creating any variety of reports and analyses is drag-and-drop easy. You can learn more about our work together through this brief video.


In short, we’ve worked with Informatica to ensure the Jaspersoft BI platform can provide analytic access to Hadoop for anyone who needs to access and understand data – whether its an executive who wants a summarized dashboard or a manager who needs a detailed operational report. And, our BI platform can handle both batch processing (through Hive) as well as direct, ad hoc and near real-time access to this data, which we uniquely provide through direct HBase access. That should satisfy even the most analytic end user.


Now there’s no reason not to consider any big data source. Toward the goal of genuinely harnessing the opportunity all this new (big) data represents, it’s good to see Informatica and Jaspersoft help lead the way. Your comments are appreciated.


Brian Gentile

Chief Executive Officer

Jaspersoft