2019/11/05 13:15 “Barriers to Data Science Adoption: Why Existing Frameworks Aren’t Working”, Workshop at CASCON-Evoke, Markham, Ontario

Workshop led by @RohanAlexander and @prof_lyons at #CASCONxEvoke on “Barriers to Data Science Adoption: Why Existing Frameworks Aren’t Working“, with the following abstract.

Broadly, data science is an interdisciplinary scientific approach that provides methods to understand and solve problems in an evidence-based manner, using data and experience. Despite the clear benefits from adoption, many firms face challenges, be that legal, organisational, or business practices, when seeking to implement and embed data science within an existing framework.

In this workshop, panel and audience members draw on their experiences to elaborate on the challenges encountered when attempting to deploying data science within existing frameworks. Panel and audience members are drawn from business, academia, and think-tanks. For discussion purposes the challenges are grouped within three themes: regulatory; investment; and workforce.

The regulatory framework governing data science is outdated and fragmented, and for many new developments, regulations are in a state of flux, or non-existent. This creates an uncertain environment for investment in data science and can create barriers to the widespread adoption of state-of-the-art data science. For instance, the governance of data use and data sharing are unclear, and this may compromise trust in data. Additionally, privacy laws, currently under scrutiny in many countries, may limit how firms can use data in the near future affecting innovation, and planned investments (e.g., Google Sidewalk). As data science technologies and applications change rapidly, the regulatory framework must continually evolve or risk becoming outdated and a hindrance to developments in the field.

Investment risk exists for any project, however data science projects are especially risky for various reasons, including the fundamental role that datasets play. Creating, cleaning, updating, and securing a dataset is a difficult process that requires a substantial investment of resources. And while these are essential processes in order to extract value from data science, they rarely provide value themselves which can be a challenge when making a business case and investment decision and adds risk to the decision to adopt data science practices especially for small- and medium-sized businesses.

The workforce challenges of data science are extensive. It is difficult to recruit qualified candidates due to the specific skill sets needed, and, with more firms seeking to implement the new innovations, this problem is expected to become worse. Additionally, many fear the lack of diversity in the current pool of workers may hinder progress in cases where the data science applications are context specific and would benefit from subject-matter expertise and a diversity of experience.

Outcomes of the workshop are expected to include a report that lists a set of existing practises and high-level barriers to deployment.

Intro from Rohan Alexander (UToronto iSchool), co-organized with, Kelly Lyons (UToronto iSchool), Michelle Alexopoulos (UToronto Economics), Lisa Austin (UToronto Law)

Data science adoption doesn’t seem to have changed, over the past 5 to 10 years

Three themes:

  • Legal frameworks, consent issues, interacting with other jurisdictions
  • Organization challenges:  Difficult to add to old organization, lack of qualified candidates, lack of diversity, pipeline issue of graduates going to other countries
  • Risks:  Have to get clean datasets, so rational at 5% makes sense, or allocation of resources?

Submit questions to Slido.com, #L763

This digest was created in real-time during the meeting,based on the speaker’s presentation(s) and comments from the audience. The content should not be viewed as an official transcript of the meeting, but only as an interpretation by a single individual. Lapses, grammatical errors, and typing mistakes may not have been corrected. Questions about content should be directed to the originator. The digest has been made available for purposes of scholarship, posted by David Ing.

Panel discussants

CASCONxEvoke Workshop Panel
CASCONxEvoke Workshop Panel

Omni.ai

  • Launched by Deloitte 5 years ago
  • Ran survey, four themes
  • Found 16% adoption of AI in industry
  • 1. Lack of understanding:  Only 5% of Canadians think that they will be impacted by AI over the next 5 years, despite having smartphone.
  • 2. Lack of trust:  Data breaches, misuse of data.  Killer robots, not what machine learning is about.  Boston Dynamics video creates misconceptions.  Also chatbots used in customer care, fancy versions of press 1 for this, press 2 for that, yet people use terms like “computers are seeing”.  Computer systems as ominipresent, and don’t trust decision-makers.
  • 3. Lack of awareness:  In Toronto, ecosystem of startups, but difficult from them to link to enterprise companies.  Not getting in front of decision-makers.  Enterprises feel risk of dealing with startups that may not be around for few years.  Hard to advertise, misuse of language.
  • 4. Inability to scale:  Companies don’t know how to adopt.  May hire data scientists, but into corner, and think they’ll do cool stuff and make money.  Have to think of ROI from beginning.  May not have incentives to put into production, after the work is done.  Prove to me it works, versus assume that it’s going to work.

Ajiolomohi Egwaikhide, Senior Data Scientist, IBM Systems

  • What can go wrong?  Bad algorithm, or bad data
  • Customers want to take data, and too cool stuff, but don’t have enough data or right data to solve business problem.  Then end up with backlash.
  • Bad data: 
    • a. Insufficient quantity
    • b. Non-representative training data, or data isn’t telling them what they’re thinking.
    • c. Quality of data, has a lot of outliers, noise, missing data.  Don’t know what they should be collecting.
    • d. Irrelevant features:  Lots of columns of database, but no business capabilities around them
  • Bad algorithm:
    • a. Using fancy algorithms instead of simple models, e.g. survivor algorithm versus simpler logistic regression.  Not selling the right thing. 
    • b. Underfitting
  • People jumping into data

Inmar Givoni, Uber Self-Driving Automobile Division

  • Haven’t defined adoption.
  • John McCarthy said if it works, don’t call it artificial intelligence.
  • There’s a lot of adoption, e.g. a smartphone has 100 instances of what we might call AI.
  • Legal aspects:  e.g. supervised deep learning algorithms, in medical imaging, but then issues with privacy and disagreements from experts on labels, should otherwise be solvable.
  • Risks:  Idea of killer robots.  Self-driving paradox, if get 10% improvement, would have 1.2M die instead of 1.3M, isn’t a personalized argument.
  • Technical:  From software engineering, coding algorithm, get a precision or metric of interest, you could have messed up, you wouldn’t know, because it’s not testable in the same way as regular software.  If can tune parameters, if you don’t have a deep understanding or mathematical intuition, will get people throwing data at it.  Irresponsible use.
  • Algorithms (e.g. Tensorflow) are still experimental, missing debugging, control flow.
  • Policy:  Technology ahead of law.  Ethical considerations, e.g. people messing up traffic signs.  Will continue working on robustness, but people should go to jail for tearing down a traffic sign.
  • Productionization:  Have data scientists, prototype quickly in a sandbox environment, load, train metrics, and they say it will work.  But then to put into a production system, it’s streaming and works in real time.  It doesn’t care about models, it cares about output and costs.  e.g. build a detector 5% better, but then the car doesn’t work as well.  Not good correlation between model-level metric and system-level metric.

Legal perspective (Aaron?)

  • Barrier to adoption:
  • (i) Regulatory:  Laws are antiquated.  Cambridge Analytica, etc., is based on the consent-based model.  People don’t read the terms they click on.  Transparency.  Dealing with disclosure.  We don’t know what we’re agreeing to.
  • (ii) Investment and risk:  Big undertakings, expensive.  Senior management vs. data scientists.  Companies treating data science as just another project.  Data quality.  Considered in a silo.  No architecture for data.
  • (iii) Workforce, trading and labour market:  Requires a lot of expertise, there’s a shortage of people.  Difficult to recruit people with skills.  Expect to become worse.  Lack of qualified labour.
  • Are there ways that universities could be more involved?  Can we build universities, or training protocols, within companies?
  • Data science will become more important, not less.  How to handle?

Panel discussion together.

Clients coming for consultants, because they want to push something forward, and say we can do it ourselves.  Data scientists put into a position to just solve this.

Bad reputation on trust and ethics.

  • Data science equated to steam engine and electricity.
  • Clients aware that they should be doing something.
  • What, but then trust and ethics.
  • In banking, lots of accountability associated with models in production.
  • Line between statistics and machine learning is blurry.
  • Struggling with black box approaches
  • Lack of trust slows us down
  • How can I do it?

Question:  Reputation of data science, and how it impacts work?

When are we going to get the cars off the road?

  • Companies are heavily invested, but it’s an open-ended scientific problem.
  • We don’t know how to do it, but then companies say we’re going to have it by end of the year.
  • Already have assistive technologies that work well if someone is behind the wheel
  • Robustness at 99.9% in a lot of technologies today
  • But have a lot of variability today.
  • If have someone behind wheel that could save from serious errors, or on given route, or under weather conditions, different.

Question:  Unsolvable problems.  Researchers show examples of recognizing cat from dog, but then expect we can do cars.  Problem of lack of understanding, but it’s not zero-one.  They’re feeling overconfident.  A general vision of what AI could do, but we’re not there today.

Interdisciplinarity is complicated.  Executives not making bad decisions, requisite understanding.  It’s economic, technical, trade, privacy, transnational.  Evolutionary, not binary on-off.  Will get to better decision-making frameworks, but will take time.

Question:  Reputation.  AI is doing their job, not true, could augment.  How to correct messaging?

In high school, start of robotics, were promised 4-day work weeks.  Predict that there will no such thing as replacement.

People’s jobs change over time.  Agriculture.  Call centres are replaced by chatbots.  If remove 80% of mundane work, but unemployment work isn’t 50% today from agricultural.  Problem is over what time frame, 5 years or 15 years.

Question:  Automation versus AI?

In school, used term machine learning instead of AI.  Now AI is everything.  Lots of natural language processing.  Technologies are getting more intertwined.

People myopic about technology.  Many jobs get created in unpredicted ways, for non-technical people.  Gig economy.  How many people get married through online dating apps?  Using AirBnB, Uber, are rapid changes in life.

People thinking about how should retrain.  Retraining programs by Bank of Canada aren’t being used.

Questions:  From Twitter — It’s AI when you’re funding.  It’s machine learning when you’re building.  It’s logistic regression when it’s implemented.

First course in machine learning including logistic regression and linear regression.  If you want to call it AI, call it AI, it doesn’t matter any more.

Question:  Mining sector, predictive maintenance.  Anti-fraud, in banking, 80% of workload is logistic regression. 

If you can structure data in a table, use logistic regression.  Get stability, robustness, conversion. 

Educate customers towards getting real value.

History, neural networks became popular, due to availability of data.  Successful on a very small set of applications.  Anything that a human being can do quickly, video-audio-words.  But have age or income, probably won’t get a neural network that works better than logistic regression.  But neural networks would solve problems that weren’t solvable on logistic regression.  Reputation, but then people trying to use neural networks where they don’t apply, in video or audio types.

Question:  Domain-specific knowledge?

Clients bring a lot of domain-specific knowledge.  Haven’t been in a situation where they’ve asked for a specific algorithm.  It has to make business sense for them.

Working with data, have to understand that data.

Question:  16% adoption rate?

16% adoption rate in Canada, in small and large businesses.

Question:  Trust.  Worry about lack of regulations?  Moving target? Locking things down?

Smaller companies don’t think about regulatory aspects.  Larger companies are working with regulators.  Trust issue.

Decisions that are made will become more important.  e.g. judge will make judgement, then can verify, and subject to review.  A construct to review black box decisions?  Proprietary?  Can’t review?  Mortgage applications, university applications, bail applications.

Question:  Regulation.

In Canada, sometimes not taking care of own, e.g. GDPR is in Europe.

Question:  Not releasing datasets.  Understanding why AI is making decisions.

A lot of companies think their data is their competitive advantage.  But at the same time, want to get access to others’ data, so have to share.  Startups in Toronto work on how might share insights without sharing data.

For self-driving, Waymo isn’t shared.  Trying to figure the best way to go at it.  Tough.  Sensitivity to cyber attacks.

Emergence of three data blocks of governance. 

  • Geo-political development.  China, U.S. and EU moving different ways, other countries haven’t moved.  Would like to see Canada take leadership position.
  • Economic:  85% of top company value is intellectual capital and brand value.

Regulations under constant review?

Lawyers take a principled approach.  But things are moving quicker.  e.g. data portability, is it mine or company?  IoT and sensor data is at the bleeding edge.  CCTV cameras that got hacked.  Measure twice and cut once.

AI is a marketing word, used badly.  1960s-1970s interpretation of human consciousness.  Younger interpretation of things that do things for you, not conscious.  Executives want adoption, but not definitions.  Black box, magic.  When asked about adopting AI, are they adopting heuristic algorithms that marketers would call it.

What problem are we trying to solve?  Rate of adaptation, faster or better?  Pace of activity is right?  Whose problem are we trying to solve?  In Canada, have banks, medium-size manufacturing, and lots of small.  Need to find a way to have conversations with IT organization, as only going to give budget.  Technology has enabled a small number of companies to decimate government, control the way we’re living.  Have to look at open source, and then governments take control?

Is there something we can do, if there’s a barrier.

Data science as science.

Research money earmarked as part of IT budget.

Question:  Policy in other jurisdictions?

Transnational, also in NAFTA 2.0.  Constraints by other countries.  If want to set own policies, it’s about economic opportunity.  Don’t want to set up a regulatory framework where companies can’t operate here.

Question:  As users, might we own our own data?

What your phone does, while you’re asleep.  The amount of world knows about you isn’t good.

Trying to write a research proposal, going forward.  How should we approach?

Panel not right format?

Closing, 1 minute each.

Great technology, fourth industrial revolution, will make changes, do have to approach with caution.

What is machine learning useful for, what isn’t it useful for?  It’s mixed up.

AI, ML, data science — it’s the future, right conditions.  Need to do more education of ourselves.  There’s a lot we don’t know.

We do get to create a policy framework.

#artficial-intelligence, #cascon, #data-science, #machine-learning

2017/11/07 10:15 Donna Dillenberger, “Cognitive Blockchain”, Cascon

Plenary #cascon @DonnaExplorer IBM Fellow, IBM Research, Global leader of Enterprise Systems

This digest was created in real-time during the meeting,based on the speaker’s presentation(s) and comments from the audience. The content should not be viewed as an official transcript of the meeting, but only as an interpretation by a single individual. Lapses, grammatical errors, and typing mistakes may not have been corrected. Questions about content should be directed to the originator. The digest has been made available for purposes of scholarship, posted by David Ing.

Intro by @mrmindel, Head of IBM Centre for Advanced Studies

[Donna Dillenberger]

Cascon 2017

What is blockchain?

  • Database gets out of sync
  • Blockchain software propogates records onto other databases
  • Why not distributed databases?  Because a distributed database is owned by a single entity
  • Blockchain means no single party controls
  • In addition, distributed database could have someone deleting record
  • Can also put smart contracts onto a blockchain:  changes data, or checks for conditions before or after commitment

Cognitive

  • Have descriptive analytics, then can create predictive analytics

IBM Global Trade Digitalization demo (powered by IBM Blockchain)

Post-presentation follow-throughs.  Some of the content may be similar to …

  • Shipment, Kenya to Rotterdam, then can click on where vessel location is
  • On the blockchain, data from IoT centers, and ports
  • As ship moves, each of point put records on the blockchain — start container tracking, commercial invoice is available, packing list is available, sensor has refrigeration
  • Blockchain analytics products geophysical map
  • Then can put on sensors, for logistics planning, e.g. weather
  • If the ship is late, how late?
  • If refrigerated, if mango aren’t good, who’s at fault?

Not just shipping from export to import countries:  documents

  • Before blockchain, paper was printed, human couriers carried on ship — 15% of the international cost, $26B
  • If a way for secure exchanges, savings in billions of dollars

When Kenyan farmer brings produce, can just use mobile phone to upload documents

  • Then Kenyan regional association can approve certificate of origin
  • Smart contracts are dictating a workflow
  • Sanitation department can add certificate onto blockchain
  • All signatures done onto blockchain
  • Then horticultural association that gave the farmer seeds, they upload a commercial invoices so that coffee can leave
  • Mombasa customs, don’t have lost or forged papers, blockchain means can’t be deleted
  • Workflow programmed by smart contract, requires all signatures

Data immutable:  health inspections, sanitation, signature of individuals

  • Then can do analytics:  where is the hold-up, e.g. waiting for sanitation certificate
  • U.S. customs is asked for this for parts of products, e.g. Ikea shipping parts to the U.S.
  • A major path for opium is in the legs of furniture
  • U.S. customs wants to know that the furniture is coming from Sweden, but the legs may come from Indonesia

Once have analytics, customer asking for blockchain data to be combined with natural language to deal with compliance

  • Financial services, 30% of cost is just meeting compliance

Cognitive Blockchain demo

  • 1. Ingest regulation
  • 2. Kick off bot
  • 3. Obtain permissions (to see records)
  • 4. Check blockchain records compliance

Australia and the Kimberley process:  to reduce conflict diamonds

  • How to get a Kimberley certificate:   download a 18-page PDF

Post-presentation follow-throughs.  Some of the content may be similar to …

Have Watson ingest the 18-page PDF

  • IBM Regulatory Analytics service
  • Already has e.g. ingest Dodd-Frank, Basel resolution
  • Want to ingest this new Kimberley document
  • Watson extracts 73 rules

Build a compliance tool, taking those rules

  • Could type in role yourself
  • Connect to the block chain:  records describing the diamond, and surface the Kimberley Certificate
  • Want bot to see when the certificate was created, but not the contents describing the diamond
  • Blockchain has 1,200,000 records, there are 857,000 permitted access — can view compliant and non-compliant … there are 113,023 records that are not compliant
  • Before, human beings would have to read ALL of the records
  • Can ask the bot, what’s common about the non-compliant records:  They came from particular countries, all in the last quarter
  • AGX has to most number of non-compliances
  • If the databot allows to see more, could see which inspectors signed off
  • Could combine with weather data, for correlations:  e.g. are all records from countries that have had drought in the last 2 years?

Cognitive and blockchain:  When records are on the blockchain, how can I validate that birth certificate is really valid?

  • Created a portable solution:  IBM’s Verifier
  • Can scan drug, wine, art, luggage … manufactured parts … DNA identification … biological cell imaging … skin tissues … water pollutants … oils, liuqids, metals … currencies, passport stamps, birth certificate
  • Can attach IBM Verifier to any cell phone
  • Two vials:  Mobil-1 5w30 and Sunoco 10w30 … could use for olive oils or champagnes

What does cognitive mean?

  • Uses deep learning, uses regression, but these are just models to mine data for insights
  • Cognitive is more than deep learning, because it leans by itself, you don’t have to describe things to it
  • It also recognizes intent, e.g. human emotions
  • e.g. hurricanes are coming in the path of this ship, which will cause a delay, so let’s divert the ship so that mangoes can arrive on time
  • Not waiting for a human to feed it data

Problems with cognitive systems, AI, and analytics in general

  • Working with data
  • 90% of effort is getting data, then transforming data
  • Have to sample correctly
  • Normalize the data
  • Then, can you trust the data?  Where is it coming from?  What is the pedigree of the data?  (Delusional Tweets of a president?)
  • e.g. drugs reacting differently for different genders, sizes and weights
  • Can you trust the model itself?
  • Academics love to download data from the Internet, what do open source libraries carry
  • Microservices:  don’t code something when you can download it
  • But in training the microservice, could have been on image of Donna, with instruction then to shut down the power grid when you see her

Effects from untrusted data:

  • Poisoned tweets, news, blogs, ads
  • Have impacted elections, Brexit
  • Say that pollutants aren’t affecting air quality
  • Sick persons classified as healthy
  • Anomalies classified as normal
  • As a responsible computer science, models are trained to the unusable:  false positives — could be in dams, electrical grids, infrastructures and autonomous systms

How could blockchain help artificial data?

Use the blockchain to train on data where we known the provenance, where the data came from

  • e.g. drug experiences are from 30-year old females

Blockchain can help AI:

  • Trust:
    • Pedigree
    • Immutability
    • Auditable
  • Confidential
    • Hyperledger Fabric — sharing with confidentiality
    • Records, Grants access rights, requests
  • Provenance
    • Traces ownership and usage across complex provenance chains

Provenance, Walmart’s Food Safety Solution Built on the IBM Blockchain Platform | IBM Blockchain | August 2017 at https://www.youtube.com/watch?v=SV0KXBxSoio

Post-presentation follow-throughs.  Longer version at …

  • Did this project because of food scares
  • e.g. baby formula with melamine
  • e.g. horse meat instead of beef
  • Want to predict when food while spoil, and when the ingredients aren’t quite right
  • On Walmart cognitive blockchain, didn’t have people write onto a computer, they use with existing systems
  • Interact with humans, the way that humans want to interact, not the way that computers want them to

[Questions]

As a consumer, would like to find the problem with my egg, but will have proprietary information, and then will have a choice of who can see what.  Framework?

  • Blockchain isn’t owned by one entity
  • Hyperledger has a governance policy:  will all clients be able to see information on the blockchain
  • e.g. this blockchain has Kroger, Unilever, etc. … that don’t allow to see participants
  • Bitcoin and public blockchain allow people to see all of the data, and an anonymous person can put on data:  a potential exposure to poisonous data

Provider that doesn’t reveal data (e.g. patient)?  Can break that in emergency?

  • Looking at different approaches
  • Hyperledger allows roles
  • Could say heart surgeon sees only part of data
  • Dentists can only see that part of data
  • Up to you as patient to see that
  • If hospital owns data, then could have a smart contract, if the person comes in unconscious, might enable anyone to see data
  • Ethereum, Bitcoin, don’t permit these, Hyperledger does

Concerns about data so security that the data gets lost so that no one can get to it?  Data superpower building a back door?  Blockchains growing so large so that no one can manage them?

  • There’s a difference between blockchain implementations
  • Bitcoin keeps growing
  • Linux Foundation Hyperledger Fabric, has an activity to archive blockchain
  • e.g. after financial regulation, have to keep all financial records up to 30 years, and every transaction (trades) has be recorded, has to have copies for last day, last week, last month, up to 7 years
  • Financial companies store on tape, up to petabytes, exabytes
  • If blockchain is over 50 years old, archive that
  • Superpower?  True with public data, Ethereum and Bitcoin, anyone can see that
  • But not true with all block chain
  • With Ethereum, said superpowers can’t change:  when had a problem, said would roll back … but originally, records were to be immutable
  • Hyperledge Fabric protocols:  can add more nodes, it’s one company or person compromising his node, but then others nodes push it out and don’t allow others to join
  • IBM Secure Service Container:  when the blockchain is hosted in IBM Cloud, all of its data is automatically encrypted, not by human, but by hardware that isn’t addressable by software
  • Even if U.S. government asks for key, IBM doesn’t have then
  • This is a response to Edward Snowden, who was a system administrator
  • Blockchain data so secure that it gets lost?  Don’t understand that question, will take offline

Wish you had been at this presentation?  Some of the content may be similar to …

#bandieuhoacuanphuc, #baoduongdieuhoaanphuc, #ban_dieu_hoa_cu_an_phuc, #blockchain, #cognitive, #dillenberger, #ha_tuan_khang, #ibm, #lapdieuhoaanhuc, #muadieuhoacu, #seoconghuong, #signal_buiding, #suadieuhoaanphuc, #trendy, #user_is_king

Cascon 2012, Eliot Siegel, Opportunities in Healthcare for Big Data

At #Cascon 2012, Eliot Siegel, U. Of Maryland, Big Opportunities in Healthcare for Big Data and Real-Time Analytics.

In radiology, sophisticated medical devices, but primitive IT like the 1970s.

Typical doctor only reads 2 to 3 hours per month, while amount of data doubles over 5 years.

Last year spoke on Watson Q/A project, will speak quickly on opportunities.

Hottest application is on fraud detection.

Population analytics, particularly at payer level, then drill down e.g. diagnostic codes for at risk segments.

Evaluation of patient cohorts, e.g. with protheses.

Predictive / opportuntity analysis, possible readmission.

Diagnotic imaging, from terabytes now to petabytes. Cardiac study could have 1500 scans. Functional MRI with iterative reconstruction. How to scale up, send out.

Images have rich metadata. Automatic anatomy identification. Tumour characteristics, can tailor to patient and type of cancer.

Based on genomic, image and clinical data, could generate survival rate probability graph.

Patient signal information, e.g. pulse, oxygen. In future, will monitor specific patients, try to predict when they will come in.

Personalized medicine, DNA sequencing is under $1000, will continue to fall. Stephen Gold says analytical market will be $76 billion by 2015.

Watson: natural language, hypothesis generation, then evidence-based learning.

IBM patient care and insights, not just from electronic medical record, but also socioeconomic, where they’ve travelled, etc.

Pre-personalized medicine. In 2008, Kennedy had seizure, found malignant glioma. But no history, information systems not connected to compare similar cases.

Want to transcend clinical trials.

Compare Google search to question asking on probable diagnoses. Want analytic decision-support.

How to consume Electronic Medical Record? Unstructured.

Brought in Jeopardy team. Case of rash. No way to get links, will copy and paste. No person in charge of patient problem list.

Multiple clinical trials, with tens of thousands of patients. After published, no access to data.

Scott Duvall will talk about VA’s corporate data warehouse Vinci. Can’t export data, so need analytics in house, to answer questions.

Everyone guards their own databases, valuable for grant proposals

Million veteran project.

Want synthesis / display of complex information in EMR, would like clever graphical way, then drill down.

Would like to work in emergency room, before EMR is written.

So many amazing big data challenges in healthcare and medical imaging. Ten to fifteen years behind.

#siegel

Cascon 2012, Joanna Ng, Design Approach to Informatics Platform

Joanna Ng, IBM, #Cascon 2012 workshop on Analytics of Big Data Generated by Healthcare and Personalized Medicine domain.

Process of fixing bugs isn’t enough.

Compare software development to film production.

Start with big data platform, used for health informatics.

Stereotype users, then understand purposes, goals, tasks, interactions.

First problem space, then ideation of shifts, in contrast to engineering approach.

Informatics as analytics, information retrieval, semantic reasoning, data mining.

Data: unstructured, what is trustworthy? Could find some drug more efficacious, since all patients are different.

Predictive analytics: e.g. which chemotherapy will statistically yield the longest survival years, for the physician at time of decision.

Data mining for new patterns on inferences, associations

First challenge: what question should we ask? Doctor can’t look at data when patient is in front of him/her.

Interface as question / answer with natural language, or search query?

Data entitlement: directly manipulate, use, view; expose / exchange data

Functional entitlement: direct or indirect data operations; see or can’t see

Need this understanding to design the interface to achieve goals

#siegel