Author Archives: pacogd

Introducing U-SQL; Part 1 of 2: Introduction and C# extensibility. — Big Data Virtual Chapter Meeting

Online Meeting URL:https://attendee.gotowebinar.com/register/400344650927823106
RSVPURL: https://attendee.gotowebinar.com/register/400344650927823106

Speaker: Michael Rys

Abstract

Making Big Data processing easy requires a great developer support that hides the complexity of managing the scale, allows to easily integrate custom code to handle the complex processing requirements ranging from data cleanup to advanced processing of unstructured data, and provides great tool support for the developer to help in the iterative development process. Thus when we at Microsoft introduced the Azure Data Lake, we decided to also include a new language called U-SQL to make Big data processing easy. It unifies the declarative power of SQL and the extensibility of C# to make writing custom processing of big data easy. It also unifies processing over all data – structured, semistructured and unstructured data – and queries over both local data and remote SQL data sources. This presentation will give you an overview on U-SQL, why we decided to build a new language, what its core philosophical underpinnings are as well as show the language in its natural habitat – the development tooling – showing the language capabilities as well as the tool support from starting your first script to analyzing its performance.

 

Bio

Michael has been doing data processing and query languages since the 1980s. Among other things he has been representing Microsoft on the XQuery and SQL design committees and has taken SQL Server beyond relational with XML, Geospatial and Semantic Search. Currently he is working on Big Data query languages such as SCOPE and U-SQL when he is not enjoying time with his family under water, on the ski slopes, or at autocross.

Paco Gonzalez — Big Data Virtual Chapter Leader

 

Machine Learning and Cortana Perceptual APIs — Big Data Virtual Chapter Meeting

Online Meeting URL:https://attendee.gotowebinar.com/register/7985201368974943490
RSVPURL: https://attendee.gotowebinar.com/register/7985201368974943490

Machine Learning is key to Predictive Analytics and Big Data. It helps us identify patterns in data and is critical in answering complex business questions. This session presents the use of Machine Learning and Cortana Perceptual APIs in common business scenarios. Cortana Perceptual APIs (Face, Vision, Speech, LUIS, Tone) are based on complex machine learning models that help us move from unstructured big data (video, text, audio, speech) to structured data. We’ll show a demo of how it works and then business cases in both the banking and media industries where Cortana improved IT ROI, compliance and customer retention.

Speaker:

Paco Gonzalez:

Paco is a SolidQ Mentor and an MCT and MCP on BizTalk Server and SQL Server. He is coauthor of Microsoft Training Kit 441, the PASS Big Data Virtual Chapter Leader, SQL Saturday Barcelona Organizer and volunteer at the PASS Atlanta BI User Group. He is also writing his PhD thesis about Analyzing Social Data with Machine Learning. Paco is a frequent speaker at conferences both large and small: SQL Saturdays, TechEds, PASS BA, PASS Summit,  DevWeek London, and PAW Chicago and London.  He is based in Atlanta and has also worked for SolidQ in UK, Australia and Spain.

 

Paco Gonzalez — Big Data Virtual Chapter Leader

Introducing New Capabilities for Big Data Analytics in the Cloud — Big Data Virtual Chapter Meeting

http://bigdata.sqlpass.org

Language: English
Event Type: Online
Online Meeting URL:https://attendee.gotowebinar.com/register/8460698364507116289
RSVPURL: 
https://attendee.gotowebinar.com/register/8460698364507116289

Introducing New Capabilities for Big Data Analytics in the Cloud

Abstract: At Microsoft, we work with thousands of enterprises every day, and leverage Exabytes of data to run our own businesses such as Windows, Bing and XBox.  In this talk, we’ll showcase key customer use cases that show how we have overcome common obstacles in big data adoption such as the high learning curve, cost of implementation, challenges in tuning infrastructure, and providing enterprise grade security.  Come learn how you can leverage the rich analytic services in Azure to transform your data easily at any scale with the tools you already know. This talk will introduce new capabilities to process data at any scale using familiar tools.  You will also learn how to compose tools such as Spark and R or do scale-out querying on demand, at massive scale with no hardware to deploy, software to tune/configure and infrastructure to manage.

 
 

Speaker

Matt Winkler, Microsoft Program Manager HDInsight.

When


UTC : Tue, Oct 06 2015 
18:00 – 19:00
Event Time : Tue, Oct 06 2015 14:00 – 15:00 Eastern Daylight Time

Cortana Analytics WorkShop

I attended the Cortana Analytics workshop at Microsoft offices in Redmond. I have been working with all these products during these years and I love it, I believe in its potential and I am sure our Customers will improve by using this.Untitled

Seeing Microsoft presenting about the Cortana Suite with passion it is fabulous.

Microsoft is by far #1 in Enterprise Big Data and Data Science, so we are in the right place J

Some of the Products in the Cortana Analytics Suites are:

Azure Machine Learning

Azure HDInsight

Azure Stream Analytics

Azure SQL Data Warehouse

Azure Data Lake

Event Hubs

Azure Data Catalog

Azure Data Factory

Check: http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/overview.aspx and its webminar to start playing.

Follow @MSAdvAnalytics to stay tunned, Microsoft will be uploading the recordings of the session.

Some of the impressive session around Azure Machine Learning and Data Science were:

Cortana Analytics Perceptual APIs

Cortana Analytics for Marketing

Predictive Maintenance in the IoT Era

Deep Neural Networks

Milliman Integrate Powered by Cortana Analytics

Enjoy J

R & SQL Server 2016


Some of the limitations we have with R: 1.- Data “must” be in memory; 2.- The process “is not” executed in parallel.

There are some workarounds for both the parallel and the in memory, but with custom development.

RStudio.

One sample of how we deal with sql server data and R is by having R Studio reading from SQL Server using RODBC, RJDBC, executing some R and then moving results back to SQL Server. With some custom development we are able to build a model and then query the model, but as said, too much custom development and poor performance for real time predictions. Still the in Memory and parallel limitations.

Azure Machine Learning.

Azure ML is great for dealing with R. We can load data into Azure ML, from different Microsoft products and external sources, execute R, and then move results back into a repository. Querying for predictions also needs some custom development but is doable. Azure ML is on cloud so not an option for on premise customers.

SQL Server 2016

From what Microsoft is presenting (Ignite) and what we can see in the CTP. This is like seeing a dream come true.

Part of this success is because of the acquisition of Revolution Analytics.

t-sql executes R directly, no R studio or any other external piece. Using data stored in SQL Server, we can build a model, query a model, check accuracy, and store results back in sql server. Applications can query a model connecting to sql server, like we do with linked server to SSAS for data mining.

Revolution R was able to deal with in Memory limitations and to execute in Parallel, so we can assume this will also happens in R within SQL Server.

SQL Server supporting R in the database engine brings a lot of power to our data mining capabilities, we can have the full functionally of R in our SQL Server environment. Easy to process or query. It is on premise!!!

R is a great data science script lang, library and executing environment, R + SQL Server is a huge step. I am building BIML code to generate packages that call t-sql for processing, querying and testing different models, easily integration with Power BI, SSRS, tabular direct query predictions…. Huge step.

 
 

The image of this post comes from the SQL server blog http://blogs.technet.com/b/dataplatforminsider/archive/2015/05/04/sql-server-2016-public-preview-coming-this-summer.aspx

Paco Gonzalez

paco@solidq.com

Finding business value in Big Data (What exactly is Big Data and why should I care?

Next Meeting: Tue, May 26 2015

Finding business value in Big Data (What exactly is Big Data and why should I care?)

Language: English

Event Type: Online

Online Meeting URL: https://attendee.gotowebinar.com/register/8428270465460421122

RSVPURL: https://attendee.gotowebinar.com/register/8428270465460421122

I often hear from clients: “We don’t know much about Big Data – can you tell us what it is and how it can help our business?” Yes! The first step is this vendor-free presentation, where I start with a business level discussion, not a technical one. Big Data is an opportunity to re-imagine our world, to track new signals that were once impossible, to change the way we experience our communities, our places of work and our personal lives. I will help you to identify the business value opportunity from Big Data and how to operationalize it. Yes, we will cover the buzz words: modern data warehouse, Hadoop, cloud, MPP, Internet of Things, and Data Lake, but I will show use cases to better understand them. In the end, I will give you the ammo to go to your manager and say “We need Big Data an here is why!” Because if you are not utilizing Big Data to help you make better business decisions, you can bet your competitors are. Level 100

Speaker James Serra:

James is a technology solution professional specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer. He is a SQL Server MVP with over 25 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker. He is the author of the book “Reporting with Microsoft SQL Server 2012”.

 

Paco Gonzalez — Big Data Virtual Chapter Leader

Apache Storm – Real-time Analytics

Tue, Mar 17 2015

Language: English

Event Type: Online

Online Meeting URL: https://attendee.gotowebinar.com/register/9007907704887741954

RSVPURL: https://attendee.gotowebinar.com/register/9007907704887741954

There has been a lot of buzz around IoT (Internet of Things) which implies streaming data at a high rate and either persisting it on disk for further analysis or applying business rules against the stream in realtime. Storm is the mechanism for doing this and is widely adopted and used by some of the largest, most successful companies in the world. This webinar will talk about what a Storm architecture looks like and dive into the components and terminology which will help you get started. We’ll explore various use cases so you can determine if Storm may be a solution for your organization.

Speaker:

 

Scott Shaw

Solutions Engineer

Hortonworks

Scott Shaw has over a decade experience in data management, He is a frequent speaker at local and national community events and has co-authored two books on T-SQL. He teaches as well as implements Pig, Hive, Hadoop, as well as teaches courses on SQL Server, and Microsoft BI. He is currently working on a book titled “Practical Hive” to be published by Apress publishing. He lives in Saint Louis and is a Solutions Engineer for Hortonworks.

When

 

iCal

UTC : Tue, Mar 17 2015 18:00 – 19:00

Event Time : Tue, Mar 17 2015 14:00 – 15:00 Eastern Daylight Time

Big Data Mining from SSAS, AzureML to Mahout

 

Hello, I am the speaker of our next Big Data virtual chapter meeting:

Tue, Feb 10 2015

Language: English
Event Type: Online
Online Meeting URL:https://attendee.gotowebinar.com/register/5660132292319688449
RSVPURL: https://attendee.gotowebinar.com/register/5660132292319688449

RSPV: https://attendee.gotowebinar.com/register/5660132292319688449

Abstract:

Join this session to learn about traditional data mining with the power of Mahout in Hadoop and Azure Machine Learning. Mahout is a machine learning library, supported in HDinsight. HDInsight is the Microsoft service that brings Apache Hadoop to the cloud, and Mahout is a powerful tool for processing models within HDFS and MapReduce. This session will cover how data mining is implemented in the context of big data and the cloud. This session will cover the full data mining cycle: ETL,building, testing, and training data models to visualization, testing, and real-time querying. You will see how to discover patterns and make predictions, classifications, and recommendations to get all the insights from your structured and non-structured big data.

Speaker: Paco Gonzalez

Paco Gonzalez, a SolidQ Mentor, is the PASS Big Data Virtual Chapter Leader, SQL Saturday Barcelona Organizer and assistant at the PASS Atlanta BI User Group. He is finishing his PhD thesis about: “Analyzing Social Data with Machine Learning.” He is an MCT and MCP on BizTalk Server and SQL Server. He is coauthor of Microsoft Training Kit 441. Paco is a frequent speaker at large and small conferences. SQL Saturdays, TechEds, PASS BA Chicago,  PASS BA San Jose, DevWeek London, and PAW Chicago and London. Paco relocated from London to Atlanta in late 2014.

@pacoSQL