2019/11/05 13:15 “Barriers to Data Science Adoption: Why Existing Frameworks Aren’t Working”, Workshop at CASCON-Evoke, Markham, Ontario

Workshop led by @RohanAlexander and @prof_lyons at #CASCONxEvoke on “Barriers to Data Science Adoption: Why Existing Frameworks Aren’t Working“, with the following abstract.

Broadly, data science is an interdisciplinary scientific approach that provides methods to understand and solve problems in an evidence-based manner, using data and experience. Despite the clear benefits from adoption, many firms face challenges, be that legal, organisational, or business practices, when seeking to implement and embed data science within an existing framework.

In this workshop, panel and audience members draw on their experiences to elaborate on the challenges encountered when attempting to deploying data science within existing frameworks. Panel and audience members are drawn from business, academia, and think-tanks. For discussion purposes the challenges are grouped within three themes: regulatory; investment; and workforce.

The regulatory framework governing data science is outdated and fragmented, and for many new developments, regulations are in a state of flux, or non-existent. This creates an uncertain environment for investment in data science and can create barriers to the widespread adoption of state-of-the-art data science. For instance, the governance of data use and data sharing are unclear, and this may compromise trust in data. Additionally, privacy laws, currently under scrutiny in many countries, may limit how firms can use data in the near future affecting innovation, and planned investments (e.g., Google Sidewalk). As data science technologies and applications change rapidly, the regulatory framework must continually evolve or risk becoming outdated and a hindrance to developments in the field.

Investment risk exists for any project, however data science projects are especially risky for various reasons, including the fundamental role that datasets play. Creating, cleaning, updating, and securing a dataset is a difficult process that requires a substantial investment of resources. And while these are essential processes in order to extract value from data science, they rarely provide value themselves which can be a challenge when making a business case and investment decision and adds risk to the decision to adopt data science practices especially for small- and medium-sized businesses.

The workforce challenges of data science are extensive. It is difficult to recruit qualified candidates due to the specific skill sets needed, and, with more firms seeking to implement the new innovations, this problem is expected to become worse. Additionally, many fear the lack of diversity in the current pool of workers may hinder progress in cases where the data science applications are context specific and would benefit from subject-matter expertise and a diversity of experience.

Outcomes of the workshop are expected to include a report that lists a set of existing practises and high-level barriers to deployment.

Intro from Rohan Alexander (UToronto iSchool), co-organized with, Kelly Lyons (UToronto iSchool), Michelle Alexopoulos (UToronto Economics), Lisa Austin (UToronto Law)

Data science adoption doesn’t seem to have changed, over the past 5 to 10 years

Three themes:

  • Legal frameworks, consent issues, interacting with other jurisdictions
  • Organization challenges:  Difficult to add to old organization, lack of qualified candidates, lack of diversity, pipeline issue of graduates going to other countries
  • Risks:  Have to get clean datasets, so rational at 5% makes sense, or allocation of resources?

Submit questions to Slido.com, #L763

This digest was created in real-time during the meeting,based on the speaker’s presentation(s) and comments from the audience. The content should not be viewed as an official transcript of the meeting, but only as an interpretation by a single individual. Lapses, grammatical errors, and typing mistakes may not have been corrected. Questions about content should be directed to the originator. The digest has been made available for purposes of scholarship, posted by David Ing.

Panel discussants

CASCONxEvoke Workshop Panel
CASCONxEvoke Workshop Panel

Omni.ai

  • Launched by Deloitte 5 years ago
  • Ran survey, four themes
  • Found 16% adoption of AI in industry
  • 1. Lack of understanding:  Only 5% of Canadians think that they will be impacted by AI over the next 5 years, despite having smartphone.
  • 2. Lack of trust:  Data breaches, misuse of data.  Killer robots, not what machine learning is about.  Boston Dynamics video creates misconceptions.  Also chatbots used in customer care, fancy versions of press 1 for this, press 2 for that, yet people use terms like “computers are seeing”.  Computer systems as ominipresent, and don’t trust decision-makers.
  • 3. Lack of awareness:  In Toronto, ecosystem of startups, but difficult from them to link to enterprise companies.  Not getting in front of decision-makers.  Enterprises feel risk of dealing with startups that may not be around for few years.  Hard to advertise, misuse of language.
  • 4. Inability to scale:  Companies don’t know how to adopt.  May hire data scientists, but into corner, and think they’ll do cool stuff and make money.  Have to think of ROI from beginning.  May not have incentives to put into production, after the work is done.  Prove to me it works, versus assume that it’s going to work.

Ajiolomohi Egwaikhide, Senior Data Scientist, IBM Systems

  • What can go wrong?  Bad algorithm, or bad data
  • Customers want to take data, and too cool stuff, but don’t have enough data or right data to solve business problem.  Then end up with backlash.
  • Bad data: 
    • a. Insufficient quantity
    • b. Non-representative training data, or data isn’t telling them what they’re thinking.
    • c. Quality of data, has a lot of outliers, noise, missing data.  Don’t know what they should be collecting.
    • d. Irrelevant features:  Lots of columns of database, but no business capabilities around them
  • Bad algorithm:
    • a. Using fancy algorithms instead of simple models, e.g. survivor algorithm versus simpler logistic regression.  Not selling the right thing. 
    • b. Underfitting
  • People jumping into data

Inmar Givoni, Uber Self-Driving Automobile Division

  • Haven’t defined adoption.
  • John McCarthy said if it works, don’t call it artificial intelligence.
  • There’s a lot of adoption, e.g. a smartphone has 100 instances of what we might call AI.
  • Legal aspects:  e.g. supervised deep learning algorithms, in medical imaging, but then issues with privacy and disagreements from experts on labels, should otherwise be solvable.
  • Risks:  Idea of killer robots.  Self-driving paradox, if get 10% improvement, would have 1.2M die instead of 1.3M, isn’t a personalized argument.
  • Technical:  From software engineering, coding algorithm, get a precision or metric of interest, you could have messed up, you wouldn’t know, because it’s not testable in the same way as regular software.  If can tune parameters, if you don’t have a deep understanding or mathematical intuition, will get people throwing data at it.  Irresponsible use.
  • Algorithms (e.g. Tensorflow) are still experimental, missing debugging, control flow.
  • Policy:  Technology ahead of law.  Ethical considerations, e.g. people messing up traffic signs.  Will continue working on robustness, but people should go to jail for tearing down a traffic sign.
  • Productionization:  Have data scientists, prototype quickly in a sandbox environment, load, train metrics, and they say it will work.  But then to put into a production system, it’s streaming and works in real time.  It doesn’t care about models, it cares about output and costs.  e.g. build a detector 5% better, but then the car doesn’t work as well.  Not good correlation between model-level metric and system-level metric.

Legal perspective (Aaron?)

  • Barrier to adoption:
  • (i) Regulatory:  Laws are antiquated.  Cambridge Analytica, etc., is based on the consent-based model.  People don’t read the terms they click on.  Transparency.  Dealing with disclosure.  We don’t know what we’re agreeing to.
  • (ii) Investment and risk:  Big undertakings, expensive.  Senior management vs. data scientists.  Companies treating data science as just another project.  Data quality.  Considered in a silo.  No architecture for data.
  • (iii) Workforce, trading and labour market:  Requires a lot of expertise, there’s a shortage of people.  Difficult to recruit people with skills.  Expect to become worse.  Lack of qualified labour.
  • Are there ways that universities could be more involved?  Can we build universities, or training protocols, within companies?
  • Data science will become more important, not less.  How to handle?

Panel discussion together.

Clients coming for consultants, because they want to push something forward, and say we can do it ourselves.  Data scientists put into a position to just solve this.

Bad reputation on trust and ethics.

  • Data science equated to steam engine and electricity.
  • Clients aware that they should be doing something.
  • What, but then trust and ethics.
  • In banking, lots of accountability associated with models in production.
  • Line between statistics and machine learning is blurry.
  • Struggling with black box approaches
  • Lack of trust slows us down
  • How can I do it?

Question:  Reputation of data science, and how it impacts work?

When are we going to get the cars off the road?

  • Companies are heavily invested, but it’s an open-ended scientific problem.
  • We don’t know how to do it, but then companies say we’re going to have it by end of the year.
  • Already have assistive technologies that work well if someone is behind the wheel
  • Robustness at 99.9% in a lot of technologies today
  • But have a lot of variability today.
  • If have someone behind wheel that could save from serious errors, or on given route, or under weather conditions, different.

Question:  Unsolvable problems.  Researchers show examples of recognizing cat from dog, but then expect we can do cars.  Problem of lack of understanding, but it’s not zero-one.  They’re feeling overconfident.  A general vision of what AI could do, but we’re not there today.

Interdisciplinarity is complicated.  Executives not making bad decisions, requisite understanding.  It’s economic, technical, trade, privacy, transnational.  Evolutionary, not binary on-off.  Will get to better decision-making frameworks, but will take time.

Question:  Reputation.  AI is doing their job, not true, could augment.  How to correct messaging?

In high school, start of robotics, were promised 4-day work weeks.  Predict that there will no such thing as replacement.

People’s jobs change over time.  Agriculture.  Call centres are replaced by chatbots.  If remove 80% of mundane work, but unemployment work isn’t 50% today from agricultural.  Problem is over what time frame, 5 years or 15 years.

Question:  Automation versus AI?

In school, used term machine learning instead of AI.  Now AI is everything.  Lots of natural language processing.  Technologies are getting more intertwined.

People myopic about technology.  Many jobs get created in unpredicted ways, for non-technical people.  Gig economy.  How many people get married through online dating apps?  Using AirBnB, Uber, are rapid changes in life.

People thinking about how should retrain.  Retraining programs by Bank of Canada aren’t being used.

Questions:  From Twitter — It’s AI when you’re funding.  It’s machine learning when you’re building.  It’s logistic regression when it’s implemented.

First course in machine learning including logistic regression and linear regression.  If you want to call it AI, call it AI, it doesn’t matter any more.

Question:  Mining sector, predictive maintenance.  Anti-fraud, in banking, 80% of workload is logistic regression. 

If you can structure data in a table, use logistic regression.  Get stability, robustness, conversion. 

Educate customers towards getting real value.

History, neural networks became popular, due to availability of data.  Successful on a very small set of applications.  Anything that a human being can do quickly, video-audio-words.  But have age or income, probably won’t get a neural network that works better than logistic regression.  But neural networks would solve problems that weren’t solvable on logistic regression.  Reputation, but then people trying to use neural networks where they don’t apply, in video or audio types.

Question:  Domain-specific knowledge?

Clients bring a lot of domain-specific knowledge.  Haven’t been in a situation where they’ve asked for a specific algorithm.  It has to make business sense for them.

Working with data, have to understand that data.

Question:  16% adoption rate?

16% adoption rate in Canada, in small and large businesses.

Question:  Trust.  Worry about lack of regulations?  Moving target? Locking things down?

Smaller companies don’t think about regulatory aspects.  Larger companies are working with regulators.  Trust issue.

Decisions that are made will become more important.  e.g. judge will make judgement, then can verify, and subject to review.  A construct to review black box decisions?  Proprietary?  Can’t review?  Mortgage applications, university applications, bail applications.

Question:  Regulation.

In Canada, sometimes not taking care of own, e.g. GDPR is in Europe.

Question:  Not releasing datasets.  Understanding why AI is making decisions.

A lot of companies think their data is their competitive advantage.  But at the same time, want to get access to others’ data, so have to share.  Startups in Toronto work on how might share insights without sharing data.

For self-driving, Waymo isn’t shared.  Trying to figure the best way to go at it.  Tough.  Sensitivity to cyber attacks.

Emergence of three data blocks of governance. 

  • Geo-political development.  China, U.S. and EU moving different ways, other countries haven’t moved.  Would like to see Canada take leadership position.
  • Economic:  85% of top company value is intellectual capital and brand value.

Regulations under constant review?

Lawyers take a principled approach.  But things are moving quicker.  e.g. data portability, is it mine or company?  IoT and sensor data is at the bleeding edge.  CCTV cameras that got hacked.  Measure twice and cut once.

AI is a marketing word, used badly.  1960s-1970s interpretation of human consciousness.  Younger interpretation of things that do things for you, not conscious.  Executives want adoption, but not definitions.  Black box, magic.  When asked about adopting AI, are they adopting heuristic algorithms that marketers would call it.

What problem are we trying to solve?  Rate of adaptation, faster or better?  Pace of activity is right?  Whose problem are we trying to solve?  In Canada, have banks, medium-size manufacturing, and lots of small.  Need to find a way to have conversations with IT organization, as only going to give budget.  Technology has enabled a small number of companies to decimate government, control the way we’re living.  Have to look at open source, and then governments take control?

Is there something we can do, if there’s a barrier.

Data science as science.

Research money earmarked as part of IT budget.

Question:  Policy in other jurisdictions?

Transnational, also in NAFTA 2.0.  Constraints by other countries.  If want to set own policies, it’s about economic opportunity.  Don’t want to set up a regulatory framework where companies can’t operate here.

Question:  As users, might we own our own data?

What your phone does, while you’re asleep.  The amount of world knows about you isn’t good.

Trying to write a research proposal, going forward.  How should we approach?

Panel not right format?

Closing, 1 minute each.

Great technology, fourth industrial revolution, will make changes, do have to approach with caution.

What is machine learning useful for, what isn’t it useful for?  It’s mixed up.

AI, ML, data science — it’s the future, right conditions.  Need to do more education of ourselves.  There’s a lot we don’t know.

We do get to create a policy framework.

#artficial-intelligence, #cascon, #data-science, #machine-learning