Modern Data Science

Today, Modern Data Science stands on the shoulders of traditional Data Science and Six Sigma for statistical uses and goes much further with big and rich data. Modern Data Science uses regression and prediction, classification, and hypothesis testing, as well as adding deep learning and methodologies for "recommendation systems." Some people see these as modern Artificial Intelligence (AI) systems. Early advances were made by Google pioneers for Hadoop while managing a world-class internet search service at scale. Facebook has been key for advanced social enablement, data collection, and recommendation engines. Our world has advanced rapidly with Big Data and Modern Data Science because of these commendable and early efforts. Data Science is an area of study for many college programs now.

"Data scientist" has also become a popular occupation with Harvard Business Review dubbing it "The Sexiest Job of the 21st Century"

"Data Science Process" -  Decomposition (orange points)

Below is a process for Data Science. You will notice that steps 1 through 3 are traditional steps for the Extract, Transform, and Load (ETL) process for Enterprise Data Warehouses (EDWs). Data Science can include formal or end-user informal ETL. The orange circle numbers and notes were added by Alexicon to demonstrate our understanding and focus on embedding learned know-how from Data Science activates (4) into the formal (1, 2 & 3) ETL process and/or including them in the database layer (3 & 5) as embedded Models & Algorithms for runtime computations. The EDW will end up having a “clean dataset” or table(s) and algorithms or computation codes for database use.

"From the business perspective, data science is an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis."


In-database Computations Are Better and Faster

The place for proven computations is in the database (not client desktop tools). Algorithmically efficient computations are important with large data sets. They must also be meticulously vetted for accuracy. Users expect response times from newer databases in under two seconds versus many seconds or minutes from traditional big reporting systems. There is still the need to balance summary and detail levels. The goal is to move computations to run in the database at super-fast speeds, with summary or aggregate results sent to the visualization layer or report. Smaller result sets from the database are easier on the network, as well. Fast query results help BI Tool Users by providing quick access to get what they want when they need it. User and analyst time typically runs at a premium. Many BI systems are a learning hub for a company by fostering user exploration and use (speed is important). Highly used systems provide great return on investments (ROIs) across the income statement (P&L) and balance sheet over many years.

High-dimensional Statistics

Advanced uses in massive multidimensional spaces is one of our key focuses. These new systems consider many text and numeric values, including where you live, your likes, your dislikes, your gender, and many other aspects of your publicly available information or “consumer data.” There is also an abundance of Worldwide Government and Business Data. Common usage areas for data are financial, geographic points and areas, weather, and Industrial IoT (sensor or machine data). We see a strong relationship between multidimensional BI and high-dimensional statistics. Both potentially have many dimensions and many numeric values in one related or analyzed schema on conformed dimensions or matched data to relate and contrast different measurable aspects of a business.

Modern Data Science makes possible newer and valuable capabilities like recommendation engines. We see these recommendation engines in play when shopping online with similar product recommendations or ads being displayed while we browse or shop. These engines are capable of understanding and associating people to products or people to people because of high-dimensional statistics. This is the big win! It is difficult for people to see and comprehend more than two dimensions (it is too complex for the human mind). We typically use a two-dimensional (2D) space or plane or X,Y plot to correlate or associate variables. Moving from two to five dimensions gets difficult, not to mention the hundreds or thousands of dimensions and numeric values that exist in Enterprise and Big Data environments today.

We are doing things today that were never before possible because of high-dimensional statistics. This is where Modern Data Science and Machine Learning capabilities have taken a quantum-leap in recent years around prediction within massive data sets, which have data with high volume, variety, and velocity (the three V’s). This creates complex and fast-moving or even streaming data environments, where Modern Data Science is used as ETL to transform and load data on the fly and/or perform real-time actions.

Data that is classified correctly allows users to contrast and compare different aspects of the enterprise with advanced business analytics capabilities. We could draw a resemblance between Modern Data Science and radar when it was originally introduced. Radar was used for military purposes to allow long-range visibility even in darkness and fog (humans could not do this without radar). This is much like high-dimensional statistics and finding needed or important information in massive multidimensional spaces. Without the machines help, it is not humanly possible to have visibility in high-dimensional spaces like fog.

Recommendation systems have fostered modern advancements with high-dimensional statistics through deep learning and other techniques used by early adapters like Amazon, Netflix, Yelp, Pandora, and Tinder for online use and matching (recommending) people to products or people to people.

Amazon is an example of a company that use of Big Data and Modern Data Science for internal operations, website operations, product recommissions and taking orders. They have mastered the management of process and data so well that they offer Amazon Web Service (AWS) so other companies can take advantage of their advanced system infrastructure and architecture to store, compute, and analyze data with advanced capabilities.

Alexicon is committed to helping customers increase enterprise performance capabilities with advanced business analytics using proven Business Management methods, Modern Data Science, and LSS. We believe this combination is unique and valuable to all companies when integrated and coordinated (integrated techniques and methods). This is especially true for major corporations with initiatives to increase sales and/or profit while controlling costs.

Contact Us


Enterprise Analytics

Big Data

Lean Six Sigma

Contact Us

© 2019 Alexicon Corporation. All rights reserved.