schliessen

Filtern

 

Bibliotheken

A Framework for Reconciling Attribute Values from Multiple Data Sources

Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a b... Full description

Journal Title: Management Science 2007, Vol.53 (12), p.1946-1963
Main Author: Jiang, Zhengrui
Other Authors: Sarkar, Sumit , De, Prabuddha , Dey, Debabrata
Format: Electronic Article Electronic Article
Language: English
Subjects:
Quelle: Alma/SFX Local Collection
Publisher: Linthicum, MD: INFORMS
ID: ISSN: 0025-1909
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_crossref_primary_10_1287_mnsc_1070_0745
title: A Framework for Reconciling Attribute Values from Multiple Data Sources
format: Article
creator:
  • Jiang, Zhengrui
  • Sarkar, Sumit
  • De, Prabuddha
  • Dey, Debabrata
subjects:
  • Accountancy
  • Applied sciences
  • Business enterprises
  • Business studies
  • Computer programming
  • Cost estimates
  • Data aggregation
  • Data collection
  • data integration
  • data quality
  • Decision analysis
  • Decision theory. Utility theory
  • Error rates
  • Exact sciences and technology
  • False negative errors
  • heterogeneous databases
  • Information attributes
  • Information management
  • Knowledge management
  • Management information systems
  • Management science
  • Misrepresentation
  • misrepresentation error
  • Operational research and scientific management
  • Operational research. Management science
  • probabilistic databases
  • Probability
  • Statistical methods
  • Studies
  • Systems integration
  • Total costs
  • type I error
  • type II error
ispartof: Management Science, 2007, Vol.53 (12), p.1946-1963
description: Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.
language: eng
source: Alma/SFX Local Collection
identifier: ISSN: 0025-1909
fulltext: fulltext
issn:
  • 0025-1909
  • 1526-5501
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.2289996
LOCALfalse
PrimoNMBib
record
control
sourceidgale_cross
recordidTN_cdi_crossref_primary_10_1287_mnsc_1070_0745
sourceformatXML
sourcesystemPC
galeidA182605081
jstor_id20122350
sourcerecordidA182605081
originalsourceidFETCH-LOGICAL-1739t-f89f8f69b6a2e1e41f14cd2a12fc8e2f3d7507d214805dac102e99bfa9328e9d3
addsrcrecordideNqFUl2L1DAULaLguPrqm1AUBcGOSdq0zeOwurvCLoJfryFNb2Yytk1NUpf9997addRlRUo-aM453HM4SfKYkjVldfW6H4JeU1KRNakKfidZUc7KjHNC7yYrQhjPqCDifvIghD0hpKqrcpWcbtITr3q4dP5rapxPP4B2g7adHbbpJkZvmylC-kV1E4TUeNenF1MX7dhB-kZFlX50k9cQHib3jOoCPLo-j5LPJ28_HZ9l5-9P3x1vzjNa5SJmphamNqVoSsWAQkENLXTLFGVG18BM3lacVC2jRU14qzQlDIRojBI5q0G0-VFytui6EQZlPcjR2175K-mUle0AUbpWlgWXwrC2bQ1vwYhcNFxoUilWKiKagjdGo9SLRWr07hu6i7K3QUPXqQHcFGRe1oySQiDw6Q3gHk0PaFMymrOypHQGPVtAW9WBtINx0Ss9K8oNrVlJOKkpota3oPBrobcYPBiL__8ivPqD0EzBDhBwC3a7i2GrphBu1dfeheDBHOKhRM4tkXNL5NwSObcECRcLwcMI-oC2Q-_8T-h3mSue43aFi2Ft8LC4KMNtnC-iKCUVZS53sUe959dZqaBVZ7zCMoXfUwhBRcXnQYsbg2obVbRuwEBs9-9xnyy0fYjOH2QZoYzlnOB7trzP0fo-_N_-ywW_wzwv5zr9IvYKkVaidXQ6e8x_AD_KDAU
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid213266119
display
typearticle
titleA Framework for Reconciling Attribute Values from Multiple Data Sources
sourceAlma/SFX Local Collection
creatorJiang, Zhengrui ; Sarkar, Sumit ; De, Prabuddha ; Dey, Debabrata
creatorcontribJiang, Zhengrui ; Sarkar, Sumit ; De, Prabuddha ; Dey, Debabrata
descriptionBecause of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.
identifier
0ISSN: 0025-1909
1EISSN: 1526-5501
2DOI: 10.1287/mnsc.1070.0745
3CODEN: MSCIAM
languageeng
publisherLinthicum, MD: INFORMS
subjectAccountancy ; Applied sciences ; Business enterprises ; Business studies ; Computer programming ; Cost estimates ; Data aggregation ; Data collection ; data integration ; data quality ; Decision analysis ; Decision theory. Utility theory ; Error rates ; Exact sciences and technology ; False negative errors ; heterogeneous databases ; Information attributes ; Information management ; Knowledge management ; Management information systems ; Management science ; Misrepresentation ; misrepresentation error ; Operational research and scientific management ; Operational research. Management science ; probabilistic databases ; Probability ; Statistical methods ; Studies ; Systems integration ; Total costs ; type I error ; type II error
ispartofManagement Science, 2007, Vol.53 (12), p.1946-1963
rights
0Copyright 2007 INFORMS
12008 INIST-CNRS
2COPYRIGHT 2007 Institute for Operations Research and the Management Sciences
3Copyright Institute for Operations Research and the Management Sciences Dec 2007
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1739t-f89f8f69b6a2e1e41f14cd2a12fc8e2f3d7507d214805dac102e99bfa9328e9d3
citesFETCH-LOGICAL-1739t-f89f8f69b6a2e1e41f14cd2a12fc8e2f3d7507d214805dac102e99bfa9328e9d3
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
thumbnail$$Usyndetics_thumb_exl
backlink
0$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=19919751$$DView record in Pascal Francis
1$$Uhttp://econpapers.repec.org/article/inmormnsc/v_3a53_3ay_3a2007_3ai_3a12_3ap_3a1946-1963.htm$$DView record in RePEc
search
creatorcontrib
0Jiang, Zhengrui
1Sarkar, Sumit
2De, Prabuddha
3Dey, Debabrata
title
0A Framework for Reconciling Attribute Values from Multiple Data Sources
1Management Science
descriptionBecause of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.
subject
0Accountancy
1Applied sciences
2Business enterprises
3Business studies
4Computer programming
5Cost estimates
6Data aggregation
7Data collection
8data integration
9data quality
10Decision analysis
11Decision theory. Utility theory
12Error rates
13Exact sciences and technology
14False negative errors
15heterogeneous databases
16Information attributes
17Information management
18Knowledge management
19Management information systems
20Management science
21Misrepresentation
22misrepresentation error
23Operational research and scientific management
24Operational research. Management science
25probabilistic databases
26Probability
27Statistical methods
28Studies
29Systems integration
30Total costs
31type I error
32type II error
issn
00025-1909
11526-5501
fulltexttrue
rsrctypearticle
creationdate2007
recordtypearticle
recordideNqFUl2L1DAULaLguPrqm1AUBcGOSdq0zeOwurvCLoJfryFNb2Yytk1NUpf9997addRlRUo-aM453HM4SfKYkjVldfW6H4JeU1KRNakKfidZUc7KjHNC7yYrQhjPqCDifvIghD0hpKqrcpWcbtITr3q4dP5rapxPP4B2g7adHbbpJkZvmylC-kV1E4TUeNenF1MX7dhB-kZFlX50k9cQHib3jOoCPLo-j5LPJ28_HZ9l5-9P3x1vzjNa5SJmphamNqVoSsWAQkENLXTLFGVG18BM3lacVC2jRU14qzQlDIRojBI5q0G0-VFytui6EQZlPcjR2175K-mUle0AUbpWlgWXwrC2bQ1vwYhcNFxoUilWKiKagjdGo9SLRWr07hu6i7K3QUPXqQHcFGRe1oySQiDw6Q3gHk0PaFMymrOypHQGPVtAW9WBtINx0Ss9K8oNrVlJOKkpota3oPBrobcYPBiL__8ivPqD0EzBDhBwC3a7i2GrphBu1dfeheDBHOKhRM4tkXNL5NwSObcECRcLwcMI-oC2Q-_8T-h3mSue43aFi2Ft8LC4KMNtnC-iKCUVZS53sUe959dZqaBVZ7zCMoXfUwhBRcXnQYsbg2obVbRuwEBs9-9xnyy0fYjOH2QZoYzlnOB7trzP0fo-_N_-ywW_wzwv5zr9IvYKkVaidXQ6e8x_AD_KDAU
startdate20071201
enddate20071201
creator
0Jiang, Zhengrui
1Sarkar, Sumit
2De, Prabuddha
3Dey, Debabrata
general
0INFORMS
1Institute for Operations Research and the Management Sciences
scope
0IQODW
1DKI
2X2L
3AAYXX
4CITATION
53V.
67WY
77WZ
87X5
97XB
1087Z
1188C
1288G
138A3
148AO
158BJ
168FI
178FJ
188FK
198FL
20ABUWG
21AZQEC
22BENPR
23BEZIV
24DWQXO
25FQK
26FRNLG
27FYUFA
28F~G
29GHDGH
30GNUQQ
31JBE
32K60
33K6~
34L.-
35M0C
36M0T
37M2M
38PQBIZ
39PQBZA
40PQEST
41PQQKQ
42PQUKI
43PRINS
44PYYUZ
45Q9U
46BOBZL
47CLFQK
sort
creationdate20071201
titleA Framework for Reconciling Attribute Values from Multiple Data Sources
authorJiang, Zhengrui ; Sarkar, Sumit ; De, Prabuddha ; Dey, Debabrata
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1739t-f89f8f69b6a2e1e41f14cd2a12fc8e2f3d7507d214805dac102e99bfa9328e9d3
rsrctypearticles
prefilterarticles
languageeng
creationdate2007
topic
0Accountancy
1Applied sciences
2Business enterprises
3Business studies
4Computer programming
5Cost estimates
6Data aggregation
7Data collection
8data integration
9data quality
10Decision analysis
11Decision theory. Utility theory
12Error rates
13Exact sciences and technology
14False negative errors
15heterogeneous databases
16Information attributes
17Information management
18Knowledge management
19Management information systems
20Management science
21Misrepresentation
22misrepresentation error
23Operational research and scientific management
24Operational research. Management science
25probabilistic databases
26Probability
27Statistical methods
28Studies
29Systems integration
30Total costs
31type I error
32type II error
toplevel
0peer_reviewed
1online_resources
creatorcontrib
0Jiang, Zhengrui
1Sarkar, Sumit
2De, Prabuddha
3Dey, Debabrata
collection
0Pascal-Francis
1RePEc IDEAS
2RePEc
3CrossRef
4ProQuest Central (Corporate)
5ABI/INFORM Collection
6ABI/INFORM Global (PDF only)
7Entrepreneurship Database
8ProQuest Central (purchase pre-March 2016)
9ABI/INFORM Global (Alumni Edition)
10Healthcare Administration Database (Alumni)
11Psychology Database (Alumni)
12Entrepreneurship Database (Alumni Edition)
13ProQuest Pharma Collection
14International Bibliography of the Social Sciences (IBSS)
15Hospital Premium Collection
16Hospital Premium Collection (Alumni Edition)
17ProQuest Central (Alumni) (purchase pre-March 2016)
18ABI/INFORM Collection (Alumni Edition)
19ProQuest Central (Alumni Edition)
20ProQuest Central Essentials
21ProQuest Central
22Business Premium Collection
23ProQuest Central Korea
24International Bibliography of the Social Sciences
25Business Premium Collection (Alumni)
26Health Research Premium Collection
27ABI/INFORM Global (Corporate)
28Health Research Premium Collection (Alumni)
29ProQuest Central Student
30International Bibliography of the Social Sciences
31ProQuest Business Collection (Alumni Edition)
32ProQuest Business Collection
33ABI/INFORM Professional Advanced
34ABI/INFORM Global
35Healthcare Administration Database
36Psychology Database
37ProQuest One Business
38ProQuest One Business (Alumni)
39ProQuest One Academic Eastern Edition
40ProQuest One Academic
41ProQuest One Academic UKI Edition
42ProQuest Central China
43ABI/INFORM Collection China
44ProQuest Central Basic
45OpenAIRE (Open Access)
46OpenAIRE
jtitleManagement Science
delivery
delcategoryRemote Search Resource
fulltextfulltext
addata
au
0Jiang, Zhengrui
1Sarkar, Sumit
2De, Prabuddha
3Dey, Debabrata
formatjournal
genrearticle
ristypeJOUR
atitleA Framework for Reconciling Attribute Values from Multiple Data Sources
jtitleManagement Science
seriestitleManagement Science
date2007-12-01
risdate2007
volume53
issue12
spage1946
epage1963
pages1946-1963
issn0025-1909
eissn1526-5501
codenMSCIAM
abstractBecause of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.
copLinthicum, MD
pubINFORMS
doi10.1287/mnsc.1070.0745
oafree_for_read