Matches in Ruben’s data for { ?s <http://schema.org/abstract> ?o }
- 978-3-319-98192-5_8 abstract "Data management increasingly demands transparency with respect to data processing. Various stakeholders need information tailored to their needs, e.g. data management plans (DMP) for funding agencies or privacy policies for the public. DMPs and privacy policies are just two examples of documents describing aspects of data processing. Dedicated tools to create both already exist. However, creating each of them manually or semi-automatically remains a repetitive and cognitively challenging task. We propose a data-driven approach that semantically represents the data processing itself as workflows and serves as a base for different kinds of result-sets, generated with SPARQL, i.e. DMPs. Our approach is threefold: (i) users with domain knowledge semantically represent workflow components; (ii) other users can reuse these components to describe their data processing via semantically enhanced workflows; and, based on the semantic workflows, (iii) result-sets are automatically generated on-demand with SPARQL queries. This paper demonstrates our tool that implements the proposed approach, based on a use-case of a researcher who needs to provide a DMP to a funding agency to approve a proposed research project.".
- 978-3-662-46641-4 abstract "If we want automated agents to consume the Web, they need to understand what a certain service does and how it relates to other services and data. The shortcoming of existing service description paradigms is their focus on technical aspects instead of the functional aspect—what task does a service perform, and is this a match for my needs? This paper summarizes our recent work on RESTdesc, a semantic service description approach that centers on functionality. It has a solid foundation in logics, which enables advanced service matching and composition, while providing elegant and concise descriptions, responding to the demands of automated clients on the future Web of Agents.".
- 978-3-662-46641-4_20 abstract "In an often retweeted Twitter post, entrepreneur and software architect Inge Henriksen described the relation of Web 1.0 to Web 3.0 as: “Web 1.0 connected humans with machines. Web 2.0 connected humans with humans. Web 3.0 connects machines with machines.” On the one hand, an incredible amount of valuable data is described by billions of triples, machine-accessible and interconnected thanks to the promises of Linked Data. On the other hand, REST is a scalable, resource-oriented architectural style that, like the Linked Data vision, recognizes the importance of links between resources. Hypermedia APIs are resources, too—albeit dynamic ones—and unfortunately, neither Linked Data principles, nor the REST-implied self-descriptiveness of hypermedia APIs sufficiently describe them to allow for long-envisioned realizations like automatic service discovery and composition. We argue that describing inter-resource links—similarly to what the Linked Data movement has done for data—is the key to machine-driven consumption of APIs In this paper, we explain how the description format RESTdesc captures the functionality of APIs by explaining the effect of dynamic interactions, effectively complementing the Linked Data vision.".
- s00799-014-0136-9 abstract "In this article, we present PREMIS OWL. This is a semantic formalisation of the PREMIS 2.2 data dictionary of the Library of Congress. PREMIS 2.2 are metadata implementation guidelines for digitally archiving information for the long term. Nowadays, the need for digital preservation is growing. A lot of the digital information produced merely a decade ago is in danger of getting lost as technologies are changing and getting obsolete. This also threatens a lot of information from heritage institutions. PREMIS OWL is a semantic long-term preservation schema. Preservation metadata are actually a mixture of provenance information, technical information on the digital objects to be preserved and rights information. PREMIS OWL is an OWL schema that can be used as data model supporting digital archives. It can be used for dissemination of the preservation metadata as Linked Open Data on the Web and, at the same time, for supporting semantic web technologies in the preservation processes. The model incorporates 24 preservation vocabularies, published by the LOC as SKOS vocabularies. Via these vocabularies, PREMIS descriptions from different institutions become highly interoperable. The schema is approved and now managed by the Library of Congress. The PREMIS OWL schema is published at http://www.loc.gov/premis/rdf/v1.".
- s10619-017-7211-3 abstract "Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess particular pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. The results show that (on a single machine) we can process datasets consisting of up to several millions of messages at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment.".
- s11042-010-0709-6 abstract "Automatic generation of metadata, facilitating the retrieval of multimedia items, potentially saves large amounts of manual work. However, the high specialization degree of feature extraction algorithms makes them unaware of the context they operate in, which contains valuable and often necessary information. In this paper, we show how Semantic Web technologies can provide a context that algorithms can interact with. We propose a generic problem-solving platform that uses Web services and various knowledge sources to find solutions to complex requests. The platform employs a reasoner-based composition algorithm, generating an execution plan that combines several algorithms as services. It then supervises the execution of this plan, intervening in case of errors or unexpected behavior. We illustrate our approach by a use case in which we annotate the names of people depicted in a photograph.".
- s11042-014-2445-9 abstract "Due to the ubiquitous Web-connectivity and portable multimedia devices, it has never been so easy to produce and distribute new multimedia resources such as videos, photos, and audio. This ever increasing production leads to an information overload for consumers, which calls for efficient multimedia retrieval techniques. Multimedia can be efficiently retrieved using its metadata, but the multimedia analysis methods that can automatically generate this metadata are currently not reliable enough for highly diverse multimedia content. A reliable and automatic method for analyzing general multimedia content is needed. We introduce a domain-agnostic framework that annotates multimedia resources using currently available multimedia analysis methods. By using a three-step reasoning cycle, this framework can assess and improve the quality of multimedia analysis results, by consecutively (1) combining analysis results effectively, (2) predicting which results might need improvement, and (3) invoking compatible analysis methods to retrieve new results. By using semantic descriptions for the Web services that wrap the multimedia analysis methods, compatible services can be automatically selected. By using additional semantic reasoning on these semantic descriptions, the different services can be repurposed across different use cases. We evaluated this problem-agnostic framework in the context of video face detection, and showed that it is capable of providing the best analysis results regardless of the input video. The proposed methodology can serve as a basis to build a generic multimedia annotation platform, which returns reliable results for diverse multimedia analysis problems. This allows for better metadata generation, and improves the efficient retrieval of multimedia resources.".
- s11761-018-0234-4 abstract "Over the last decade, Web services composition has become a thriving area of research and development endeavors for application integration and interoperability. Although Web services composition has been heavily investigated, several issues still need to be addressed. In this paper, we mainly discuss two major bottlenecks in the current process of modeling compositions. The first bottleneck is related to the level of expertise required to achieve a composition process. Typical procedural style of modeling, inspired by workflow/business process paradigm, do not provide the required abstractions. Therefore, they fail to support dynamic and self-managed compositions able to adapt to unpredictable changes. The second bottleneck in current service compositions concerns their life cycle and their management, also called their governance. In this context, we propose a declarative proof-based approach to Web service composition. Based on the three stages of pre-composition, abstraction, and composition, our solution provides an easy way to specify functional and non-functional requirements of composite services in a precise and declarative manner. It guides the user through the composition process while allowing detection and recovery of violations at both design and run-time using proofs and planning. Experiment results clearly show the added value of the proof-based solution as a viable strategy to improve the composition process.".
- j.future.2019.10.006 abstract "Functions are essential building blocks of information retrieval and information management. However, efforts implementing these functions are fragmented: one function has multiple implementations, within specific development contexts. This inhibits reuse: metadata of functions and associated implementations need to be found across various search interfaces, and implementation integration requires human interpretation and manual adjustments. An approach is needed, independent of development context and enabling description and exploration of functions and (automatic) instantiation of associated implementations. In this paper, after collecting scenarios and deriving corresponding requirements, we (i) propose an approach that facilitates functions’ description, publication, and exploration by modeling and publishing abstract function descriptions and their links to concrete implementations; and (ii) enable implementations’ automatic instantiation by exploiting those published descriptions. This way, we can link to existing implementations, and provide a uniform detailed search interface across development contexts. The proposed model (the Function Ontology) and the publication method following the Linked Data principles using standards, are deemed sufficient for this task, and are extensible to new development contexts. The proposed set of tools (the Function Hub and Function Handler) are shown to fulfill the collected requirements, and the user evaluation proves them being perceived as a valuable asset during software retrieval. Our work thus improves developer experience for function exploration and implementation instantiation.".
- j.jbi.2017.05.006 abstract "The volume and diversity of data in biomedical research has been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that would address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.".
- j.websem.2016.03.003 abstract "Billions of Linked Data triples exist in thousands of RDF knowledge graphs on the Web, but few of those graphs can be queried live from Web applications. Only a limited number of knowledge graphs are available in a queryable interface, and existing interfaces can be expensive to host at high availability. To mitigate this shortage of live queryable Linked Data, we designed a low-cost Triple Pattern Fragments interface for servers, and a client-side algorithm that evaluates SPARQL queries against this interface. This article describes the Linked Data Fragments framework to analyze Web interfaces to Linked Data and uses this framework as a basis to define Triple Pattern Fragments. We describe client-side querying for single knowledge graphs and federations thereof. Our evaluation verifies that this technique reduces server load and increases caching effectiveness, which leads to lower costs to maintain high server availability. These benefits come at the expense of increased bandwidth and slower, but more stable query execution times. These results substantiate the claim that lightweight interfaces can lower the cost for knowledge publishers compared to more expressive endpoints, while enabling applications to query the publishers’ data with the necessary reliability.".
- j.websem.2017.12.003 abstract "Visual tools are implemented to help users in defining how to generate Linked Data from raw data. This is possible thanks to mapping languages which enable detaching mapping rules from the implementation that executes them. However, no thorough research has been conducted so far on how to visualize such mapping rules, especially if they become large and require considering multiple heterogeneous raw data sources and transformed data values. In the past, we proposed the RMLEditor, a visual graph-based user interface, which allows users to easily create mapping rules for generating Linked Data from raw data. In this paper, we build on top of our existing work: we (i) specify a visual notation for graph visualizations used to represent mapping rules, (ii) introduce an approach for manipulating rules when large visualizations emerge, and (iii) propose an approach to uniformly visualize data fraction of raw data sources combined with an interactive interface for uniform data fraction transformations. We perform two additional comparative user studies. The first one compares the use of the visual notation to present mapping rules to the use of a mapping language directly, which reveals that the visual notation is preferred. The second one compares the use of the graph-based RMLEditor for creating mapping rules to the form-based RMLx Visual Editor, which reveals that graph-based visualizations are preferred to create mapping rules through the use of our proposed visual notation and uniform representation of heterogeneous data sources and data values.".
- j.websem.2019.04.001 abstract "Since the invention of Notation3 Logic, several years have passed in which the theory has been refined and applied in different reasoning engines like cwm, EYE, and FuXi. But despite these developments, a clear formal definition of Notation3’s semantics is still missing. This does not only form an obstacle for the formal investigation of that logic and its relations to other formalisms, it has also practical consequences: in many cases the interpretations of the same formula differ between reasoning engines. In this paper we tackle one of the main sources of that problem, namely the uncertainty about implicit quantification. This refers to Notation3’s ability to use bound variables for which the universal or existential quantifiers are not explicitly stated, but implicitly assumed. We provide a tool for clarification through the definition of a core logic for Notation3 that only supports explicit quantification. We specify an attribute grammar which maps Notation3 formulas to that logic according to the different interpretations and thereby define the semantics of Notation3. This grammar is then implemented and used to test the impact of the differences between interpretations on practical cases. Our dataset includes Notation3 implementations from former research projects and test cases developed for the reasoner EYE. We find that 31% of these files are understood differently by different reasoners. We further analyse these cases and categorize them in different classes of which we consider one most harmful: if a file is manually written by a user and no specific built-in predicates are used (13% of our critical files), it is unlikely that this user is aware of possible differences. We therefore argue the need to come to an agreement on implicit quantification, and discuss the different possibilities.".
- S1471068416000016 abstract "Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this article, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs’ functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application’s actual state indicated by the server’s response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale.".
- S1471068423000054 abstract "Link traversal–based query processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While LTQP allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer toward relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves.".
- iet-its.2016.0269 abstract "The European Data Portal shows a growing number of governmental organisations opening up transport data. As end users need traffic or transit updates on their day-to-day travels, route planners need access to this government data to make intelligent decisions. Developers however, will not integrate a dataset when the cost for adoption is too high. In this paper, the authors study the internal and technological challenges to publish data from the Department of Transport and Public Works in Flanders for maximum reuse. Using the qualitative Engage STakeholdErs through a systEMatic toolbox (ESTEEM) research approach, they interviewed 27 governmental data owners and organised both an internal workshop as a matchmaking workshop. In these workshops, data interoperability was discussed on four levels: legal, syntactic, semantic and querying. The interviews were summarised in ten challenges to which possible solutions were formulated. The effort needed to reuse existing public datasets today is high, yet they see the first evidence of datasets being reused in a legally and syntactically interoperable way. Publishing data so that it is reusable in an affordable way is still challenging.".
- 10494820.2017.1343191 abstract "An e-TextBook can serve as an Interactive Learning Environment (ILE), facilitating more effective teaching and learning processes. In this paper, we propose the novel concept of an EPUB 3-based Hybrid e-TextBook, which allows for interaction between the digital and the physical world. In that regard, we first investigated the gap between the expectations of teachers with respect to e-TextBook functionalities, on the one hand, and the ILE functionalities offered by e-TextBooks, on the other hand. Next, together with teachers, we co-designed and developed prototype EPUB 3-based Hybrid e-TextBooks that make it possible to connect their learning content to smart devices in classrooms, leveraging both digital publishing and Semantic Web tools. Based on experimentation with our prototype Hybrid e-TextBooks, we can argue that a semantically enriched EPUB 3-based Hybrid e-TextBook is able to act as a comprehensive ILE, providing the tools needed by teachers in smart classrooms. Furthermore, expert observations and Smiley o’meter results demonstrate an effective impact on student cognition and motivation.".
- bxt147 abstract "In this paper, we report on the task of near-duplicate photo detection in the context of events that get shared on multiple social networks. When people attend events, they more and more share event-related photos publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content so that it can be used to summarize events visually in the form of media galleries or slideshows. Therefore, methods for the deduplication of near-duplicate photos of the same event are required in order to ensure the diversity of the generated media galleries or slideshows. First, we introduce the social-network-specific reasons and challenges that cause near-duplicate photos. Second, we introduce an algorithm for the task of deduplicating near-duplicate photos stemming from social networks. Finally, we evaluate the algorithm’s results and shortcomings".
- fqt067 abstract "Unstructured metadata fields such as “description” offer tremendous value for users to understand cultural heritage objects. However, this type of narrative information is of little direct use within a machine-readable context due to its unstructured nature. This paper explores the possibilities and limitations of Named-Entity Recognition (NER) and Term Extraction (TE) to mine such unstructured metadata for meaningful concepts. These concepts can be used to leverage otherwise limited searching and browsing operations, but they can also play an important role to foster Digital Humanities research. In order to catalyze experimentation with NER and TE, the paper proposes an evaluation of the performance of three third-party entity extraction services through a comprehensive case study, based on the descriptive fields of the Smithsonian Cooper-Hewitt National Design Museum in New York. In order to cover both NER and TE, we first offer a quantitative analysis of named-entities retrieved by the services in terms of precision and recall compared to a manually annotated gold-standard corpus, then complement this approach with a more qualitative assessment of relevant terms extracted. Based on the outcomes of this double analysis, the conclusions present the added value of entity extraction services, but also indicate the dangers of uncritically using NER and/or TE, and by extension Linked Data principles, within the Digital Humanities. All metadata and tools used within the paper are freely available, making it possible for researchers and practitioners to repeat the methodology. By doing so, the paper offers a significant contribution towards understanding the value of entity recognition and disambiguation for the Digital Humanities.".
- JD-03-2017-0040 abstract "The purpose of this paper is to detail a low-cost, low-maintenance publishing strategy aimed at unlocking the value of Linked Data collections held by libraries, archives and museums (LAMs). The shortcomings of commonly used Linked Data publishing approaches are identified, and the current lack of substantial collections of Linked Data exposed by LAMs is considered. To improve on the discussed status quo, a novel approach for publishing Linked Data is proposed and demonstrated by means of an archive of DBpedia versions, which is queried in combination with other Linked Data sources. The authors show that the approach makes publishing Linked Data archives easy and affordable, and supports distributed querying without causing untenable load on the Linked Data sources. The proposed approach significantly lowers the barrier for publishing, maintaining, and making Linked Data collections queryable. As such, it offers the potential to substantially grow the distributed network of queryable Linked Data sources. Because the approach supports querying without causing unacceptable load on the sources, the queryable interfaces are expected to be more reliable, allowing them to become integral building blocks of robust applications that leverage distributed Linked Data sources. The novel publishing strategy significantly lowers the technical and financial barriers that LAMs face when attempting to publish Linked Data collections. The proposed approach yields Linked Data sources that can reliably be queried, paving the way for applications that leverage distributed Linked Data sources through federated querying.".
- JD-07-2013-0098 abstract "The paper revisits a decade after its conception the Representational State Transfer (REST) architectural style and analyses its relevance to address current challenges from the Library and Information Science (LIS) discipline. Conceptual aspects of REST are reviewed and a generic architecture to support REST is presented. The relevance of the architecture is demonstrated with the help of a case-study based on the collection registration database of the Cooper-Hewitt National Design Museum. We argue that the “resources and representations” model of REST is a sustainable way for the management of Web resources in a context of constant technological evolutions. When making information resources available on the Web, a resource-oriented publishing model can avoid the costs associated with the creation of multiple interfaces. This paper re-examines the conceptual merits of REST and translates the architecture into actionable recommendations for the LIS discipline.".
- BigData.2016.7840981 abstract "Topic Modelling (TM) has gained momentum over the last few years within the humanities to analyze topics represented in large volumes of full text. This paper proposes an experiment with the usage of TM based on a large subset of digitized archival holdings of the European Commission (EC). Currently, millions of scanned and OCR’ed files are available and hold the potential to significantly change the way historians of the construction and evolution of the European Union can perform their research. However, due to a lack of resources, only minimal metadata are available on a file and document level, seriously undermining the accessibility of this archival collection. The article explores in an empirical manner the possibilities and limits of TM to automatically extract key concepts from a large body of documents spanning multiple decades. By mapping the topics to headings of the EUROVOC thesaurus, the proof of concept described in this paper offers the future possibility to represent the identified topics with the help of a hierarchical search interface for end-users.".
- COMPSACW.2013.29 abstract "This paper describes the mechanisms involved in accessing provenance on the Web, according to the new W3C PROV specifications, and how end-users can process this information to make basic trust assessments. Additionally, we illustrate this principle by implementing a practical use case, namely Tim Berners-Lee’s vision of the “Oh, yeah?” button, enabling users to make trust assessments about documents on the web. This implementation leverages the W3C PROV specification to provide user-friendly access to the provenance of Web pages. While the extension described in this paper is specific to one browser, the majority of its components are browser-agnostic.".
- FITCE53297.2021.9588540 abstract "The Web is evolving more and more to a small set of walled gardens where a very small number of platforms determine the way that people get access to the Web (i.e. “log in via Facebook”). As a pushback against this ongoing centralization of data and services on the Web, decentralization efforts are taking place that move data into personal data vaults. This positioning paper discusses a potential personal data vault society and the required research steps in order to get there. Emphasis is given on the needed interplay between technological and business research perspectives. The focus is on the situation of Flanders, based on a recent announcement of the Flemish government to set up a Data Utility Company. The concepts as well as the suggested path for adoption can easily be extended, however, to other situations/regions as well.".
- ICSC.2016.55 abstract "In this paper, we propose and investigate a novel distance-based approach for measuring the semantic dissimilarity between two concepts in a knowledge graph. The proposed Normalized Semantic Web Distance (NSWD) extends the idea of the Normalized Web Distance, which is utilized to determine the dissimilarity between two textural terms, and utilizes additional semantic properties of nodes in a knowledge graph. We evaluate our proposal on two different knowledge graphs: Freebase and DBpedia. While the NSWD achieves a correlation of up to 0.58 with human similarity assessments on the established Miller-Charles benchmark of 30 term-pairs on the Freebase knowledge graph, it reaches an even higher correlation of 0.69 in the DBpedia knowledge graph. We thus conclude that the proposed NSWD is an efficient and effective distance-based approach for assessing semantic dissimilarity in very large knowledge graphs.".
- IOT.2014.7030116 abstract "We present an approach that combines semantic metadata and reasoning with a visual modeling tool to enable the goal-driven configuration of smart environments for end users. In contrast to process-driven systems where service mashups are statically defined, this approach makes use of embedded semantic API descriptions to dynamically create mashups that fulfill the user’s goal. The main advantage of the presented system is its high degree of flexibility, as service mashups can adapt to dynamic environments and are fault-tolerant with respect to individual services becoming unavailable. To support end users in expressing their goals, we integrated a visual programming tool with our system. This tool enables users to model the desired state of their smart environment graphically and thus hides the technicalities of the underlying semantics and the reasoning. Possible applications of the presented system include the configuration of smart homes to increase individual well-being, and reconfigurations of smart environments, for instance in the industrial automation or healthcare domains.".
- MC.2014.296 abstract "Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.".
- NWeSP.2011.6088202 abstract "The social networking website Facebook offers to its users a feature called “status updates” (or just “status”), which allows them to create microposts directed to all their contacts, or a subset thereof. Readers can respond to microposts, or in addition to that also click a “Like" button to show their appreciation for a certain micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such microposts can, for example, be achieved via Natural Language Processing (NLP). Therefore, we have implemented a RESTful mash-up NLP API based on a combination of several third party NLP APIs in order to retrieve more accurate results in the sense of emergence. In consequence, our API uses third party APIs opaquely in the background in order to deliver its output. In this paper, we describe how one can keep track of provenance, and credit back the contributions of each single API to the combined result of all APIs. In addition to that, we show how the existence of provenance metadata can help understand the way a combined result is formed, and optimize the result combination process. Therefore, we use the HTTP Vocabulary in RDF and the Provenance Vocabulary. The main contribution of our work is a description of how provenance metadata can be automatically added to the output of mash-up APIs like the one presented in this paper.".
- NWeSP.2011.6088208 abstract "Hyperlinks and forms let humans navigate with ease through websites they have never seen before. In contrast, automated agents can only perform preprogrammed actions on Web services, reducing their generality and restricting their usefulness to a specialized domain. Many of the employed services call themselves RESTful, although they neglect the hypermedia constraint as defined by Roy T. Fielding, stating that the application state should be driven by hypertext. This lack of link usage on the Web of services severely limits agents in what they can do, while connectedness forms a primary feature of the human Web. An urgent need for more intelligent agents becomes apparent, and in this paper, we demonstrate how the conjunction of functional service descriptions and hypermedia links leads to advanced, interactive agent behavior. We propose a new mode for our previously introduced semantic service description format RESTdesc, providing the mechanisms for agents to consume Web services based on links, similar to human browsing strategies. We illustrate the potential of these descriptions by a use case that shows the enhanced capabilities they offer to automated agents, and explain how this is vital for the future Web.".
- QoMEX.2012.6263875 abstract "In this paper, we present and define aesthetic principles for the automatic generation of media galleries based on media items retrieved from social networks that—after a ranking and pruning step—can serve to authentically summarize events and their atmosphere from a visual and an audial standpoint.".
- TASE.2016.2533321 abstract "One of the central research challenges in the Internet of Things and Ubiquitous Computing domains is how users can be enabled to “program” their personal and industrial smart environments by combining services that are provided by devices around them. We present a service composition system that enables the goal-driven configuration of smart environments for end users by combining semantic metadata and reasoning with a visual modeling tool. In contrast to process-driven approaches where service mashups are statically defined, we make use of embedded semantic API descriptions to dynamically create mashups that fulfill the user’s goal. The main advantage of our system is its high degree of flexibility, as service mashups can adapt to dynamic environments and are fault-tolerant with respect to individual services becoming unavailable. To support users in expressing their goals, we integrated a visual programming tool with our system that allows to model the desired state of a smart environment graphically, thereby hiding the technicalities of the underlying semantics. Possible applications of the presented system include the management of smart homes to increase individual well-being, and reconfigurations of smart environments, for instance in the industrial automation or healthcare domains.".
- 2307819.2307828 abstract "The early visions for the Semantic Web, from the famous 2001 Scientific American article by Berners-Lee et al., feature intelligent agents that can autonomously perform tasks like discovering information, scheduling events, finding execution plans for complex operations, and in general, use reasoning techniques to come up with sense-making and traceable decisions. While today—more than ten years later—the building blocks (1) resource-oriented rest infrastructure, (2) Web APIs, and (3) Linked Data are in place, the envisioned intelligent agents have not landed yet. In this paper, we explain why capturing functionality is the connection between those three building blocks, and introduce the functional API description format RESTdesc that creates this bridge between hypermedia APIs and the Semantic Web. Rather than adding yet another component to the Semantic Web stack, RESTdesc offers instead concise descriptions that reuse existing vocabularies to guide hypermedia-driven agents. Its versatile capabilities are illustrated by a real-life agent use case for Web browsers wherein we demonstrate that RESTdesc functional descriptions are capable of fulfilling the promise of autonomous agents on the Web.".
- 2487788.2488182 abstract "Hypermedia links and controls drive the Web by transforming information into affordances through which users can choose actions. However, publishers of information cannot predict all actions their users might want to perform and therefore, hypermedia can only serve as the engine of application state to the extent the user’s intentions align with those envisioned by the publisher. In this paper, we introduce distributed affordance, a concept and architecture that extends application state to the entire Web. It combines information inside the representation and knowledge of action providers to generate affordance from the user’s perspective. Unlike similar approaches such as Web Intents, distributed affordance scales both in the number of actions and the number of action providers, because it is resource-oriented instead of action-oriented. A proof-of-concept shows that distributed affordance is a feasible strategy on today’s Web.".
- 2806416.2806642 abstract "In order to assess the trustworthiness of information on social media, a consumer needs to understand where this information comes from, and which processes were involved in its creation. The entities, agents and activities involved in the creation of a piece of information are referred to as its provenance, which was standardized by W3C PROV. However, current social media APIs cannot always capture the full lineage of every message, leaving the consumer with incomplete or missing provenance, which is crucial for judging the trust it carries. Therefore in this paper, we propose an approach to reconstruct the provenance of messages on social media on multiple levels. To obtain a fine-grained level of provenance, we use an existing approach to reconstruct information cascades with high certainty, and map them to PROV using the PROV-SAID extension for social media. To obtain a coarse-grained level of provenance, we adapt a similarity-based, fuzzy provenance reconstruction approach – previously applied on news. We illustrate our approach by providing the reconstructed provenance of a limited social media dataset gathered during the 2012 Olympics, for which we were able to reconstruct a significant amount of previously unidentified connections.".
- 2814864.2814873 abstract "The RDF data model allows the description of domain-level knowledge that is understandable by both humans and machines. RDF data can be derived from different source formats and diverse access points, ranging from databases or files in CSV format to data retrieved from Web APIs in JSON, Web Services in XML or any other speciality formats. To this end, vocabularies such as RML were introduced to uniformly define how data in multiple heterogeneous sources is mapped to the RDF data model, independently of their original format. This approach results in mapping definitions that are machine-processable and interoperable. However, the way in which this data is accessed and retrieved still remains hard-coded, as corresponding descriptions are often not available or not taken into account. In this paper, we introduce an approach that takes advantage of widely-accepted vocabularies, originally used to advertise services or datasets, such as Hydra or DCAT, to define how to access Web-based or other data sources. Consequently, the generation of RDF representations is facilitated, as the description of the interaction models with the original data remains independent, interoperable and granular.".
- 2814864.2814892 abstract "A lot of Linked Data on the Web is dynamic. Despite the existing query solutions and implementations, crucial unresolved issues remain. This poster presents a novel approach to update SPARQL results client-side by exchanging fragment patches. We aim at a sustainable solution which balances the load and reduces bandwidth. Therefore, our approach benefits from reusing unchanged data and minimizing data transfer size. By only working with patches, the load on the server is minimal. Also, the bandwidth usage is low, since only relevant changes are transferred to the client.".
- 2872518.2891069 abstract "In the field of smart cities, researchers need an indication of how people move in and between cities. Yet, getting statistics of travel flows within public transit systems has proven to be troublesome. In order to get an indication of public transit travel flows in Belgium, we analyzed the query logs of the iRail API, a highly expressive route planning API for the Belgian railways. We were able to study ∼100k to 500k requests for each month between October 2012 and November 2015, which is between 0.56% and 1.66% of the amount of monthly passengers. Using data visualizations, we illustrate the commuting patterns in Belgium and confirm that Brussels, the capital, acts as a central hub. The Flemish region appears to be polycentric, while in the Walloon region, everything converges on Brussels. The findings correspond to the real travel demand, according to experts of the passenger federation Trein Tram Bus. We conclude that query logs of route planners are of high importance in getting an indication of travel flows. However, better travel intentions would be acquirable using dedicated HTTP POST requests.".
- 2910019.2910027 abstract "Public administrations are often still organised in vertical, closed silos. The lack of common data standards (common data models and reference data) for exchanging information between administrations in a cross-domain and/or cross-border setting stands in the way of digital public services and automated flow of information between public administrations. Core data models address this issue, but are often created within the closed environment of a country or region and within one policy domain. A lack of insight exists in understanding and managing the life-cycle of these initiatives on public administration information systems for data modelling and data exchange. In this paper, we outline state-of-the-art implementations and vocabularies linked to the core data models. In particular we inventoried and selected existing core data models and identified tendencies in current practices based on the criteria creation, use, maintenance and coordination. Based on this, this survey suggest policy and information management recommendations pointing to best practices regarding core data model implementations and their role in linking isolated data silos. Finally we highlight the differences in their coordination and maintenance, depending on the creation and use.".
- 2910019.2910030 abstract "Travelers expect access to tourism information at anytime, anywhere, with any media. Mobile tourism guides, accessible via the Web, provide an omnipresent approach to this. Thereby it is expensive and not trivial to (re)model, translate and transform data over and over. This inhibits many players, including governments, in developing such applications. We report on our experience in running a project on mobile tourism in Flanders, Belgium where we develop a methodology and reusable formalization for the data disclosure. We apply open data standards to achieve a reusable and interoperable datahub for mobile tourism. We organized working groups resulting in a re-usable formal specification and serialization of the domain model that is immediately usable for building mobile tourism applications. This increased the awareness and lead to semantic convergence which is forming a regional foundation to develop sustainable mobile guides for tourism.".
- 2928294.2928304 abstract "Linked Data storage solutions often optimize for low latency querying and quick responsiveness. Meanwhile, in the back-end, offline ETL processes take care of integrating and preparing the data. In this paper we explain a workflow and the results of a benchmark that examines which Linked Data storage solution and setup should be chosen for different dataset sizes to optimize the cost-effectiveness of the entire ETL process. The benchmark executes diversified stress tests on the storage solutions. The results include an in-depth analysis of four mature Linked Data solutions with commercial support and full SPARQL 1.1 compliance. Whereas traditional benchmarks studies generally deploy the triple stores on premises using high-end hardware, this benchmark uses publicly available cloud machine images for reproducibility and runs on commodity hardware. All stores are tested using their default configuration. In this setting Virtuoso shows the best performance in general. The other tree stores show competitive results and have disjunct areas of excellence. Finally, it is shown that each store’s performance heavily depends on the structural properties of the queries, giving an indication of where vendors can focus their optimization efforts.".
- 3014087.3014096 abstract "Each government level uses their own different information system. At the same time citizens expect a user-centric approach and instant access to their data or to open government data. Therefore the applications at various government levels need to be interoperable in support of the “once-only principle”: data is inputted and registered only once and then reused. Given government budget constraints and it being an expensive and no trivial task to (re)model, translate and transform data over and over, public administrations need to reduce interoperability costs. This is achieved by semantically aligning the information between the different information systems of each government level. Semantically interoperable systems facilitate citizen-centered e-government services. This paper illustrates how the OSLO program paved the way bottom-up from a broad basis of stakeholders towards a government-endorsed strategy. OSLO applied a more generic process and methodology and provide practical insights on how to overcome the encountered hurdles: political support and adoption; reaching semantic agreement. The lessons learned in the region of Flanders, Belgium can speed-up the process in other countries that face the complexity of integrating information intensive processes between different applications, administrations and government levels.".
- 3041021.3051699 abstract "On dense railway networks—such as in Belgium—train travelers are frequently confronted with overly occupied trains, especially during peak hours. Crowdedness on trains leads to a deterioration in the quality of service and has a negative impact on the well-being of the passenger. In order to stimulate travelers to consider less crowded trains, the iRail project wants to show an occupancy indicator in their route planning applications by the means of predictive modelling. As there is no official occupancy data available, training data is gathered by crowd sourcing using the Web app iRail.be and the Railer application for iPhone. Users can indicate their departure & arrival station, at what time they took a train and classify the occupancy of that train into the classes: low, medium or high. While preliminary results on a limited data set conclude that the models do not yet perform sufficiently well, we are convinced that with further research and a larger amount of data, our predictive model will be able to achieve higher predictive performances. All datasets used in the current research are, for that purpose, made publicly available under an open license on the iRail website and in the form of a Kaggle competition. Moreover, an infrastructure is set up that automatically processes new logs submitted by users in order for our model to continuously learn. Occupancy predictions for future trains are made available through an API.".
- 3041021.3054210 abstract "A large amount of public transport data is made available by many different providers, which makes RDF a great method for integrating these datasets. Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data. These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets. In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data. In this paper, we introduce a dataset generator with the capability to create realistic public transport data. This dataset generator, and the ability to configure it on different levels, makes it easier to use public transport data for benchmarking with great flexibility.".
- 3148011.3154467 abstract "While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing. While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients, they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces. In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls. We therefore explored techniques to represent this parameterized response structure in a generic but expressive way. In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce. Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure, as it conforms to the REST architectural style that has helped shape the Web into its current form.".
- 3184558.3190666 abstract "Structured Knowledge on the Web had an intriguing history before it has become successful. We briefly revisit this history, before we go into the longer discussion about how structured knowledge on the Web should be devised such that it benefits even more applications. Core to this discussion will be issues like trust, information infrastructure usability and resilience, promising realms of structured knowledge and principles and practices of data sharing.".
- 3184558.3191650 abstract "For smart decision making, user agents need real-time and historic access to open data from sensors installed in the public domain. In contrast to a closed environment, for Open Data and federated query processing algorithms, the data publisher cannot anticipate in advance on specific questions, nor can it deal with a bad cost-efficiency of the server interface when data consumers increase. When publishing observations from sensors, different fragmentation strategies can be thought of depending on how the historic data needs to be queried. Furthermore, both publish/subscribe and polling strategies exist to publish real-time updates. Each of these strategies come with their own trade-offs regarding cost-efficiency of the server-interface, user-perceived performance and CPU use. A polling strategy where multiple observations are published in a paged collection was tested in a proof of concept for parking space availabilities. In order to understand the different resource trade-offs presented by publish/subscribe and polling publication strategies, we devised an experiment on two machines, for a scalability test. The preliminary results were inconclusive and suggest more large scale tests are needed in order to see a trend. While the large-scale tests will be performed in future work, the proof of concept helped to identify the technical Open Data principles for the 13 biggest cities in Flanders.".
- 3308560.3316520 abstract "For better traffic flow and making better policy decisions, the city of Antwerp is connecting traffic lights to the Internet. The live “time to green” only tells a part of the story: also the historical values need to be preserved and need to be made accessible to everyone. We propose (i) an ontology for describing the topology of an intersection and the signal timing of traffic lights, (ii) a specification to publish these historical and live data with Linked Data Fragments and (iii) a method to preserve the published data in the long-term. We showcase the applicability of our specification with the opentrafficlights.org project where an end-user can see the live count-down as well as a line chart showing the historic “time to green” of a traffic light. We found that publishing traffic lights data as time sorted Linked Data Fragments allow synchronizing and reusing an archive to retrieve historical observations. Long-term preservation with tape storage becomes feasible when archives shift from byte preservation to knowledge preservation by combining Linked Data Fragments.".
- 3323878.3325802 abstract "To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.".
- 3366424.3383528 abstract "In healthcare, the aging of the population is resulting in a gradual shift from residential care to home care, requiring reliable follow-up of elderly people by a whole network of care providers. The environment of these patients is increasingly being equipped with different monitoring devices, which allow to obtain insight into the current condition of patient & environment. However, current monitoring platforms that support care providers are centralized and not personalized, reducing performance, scalability, autonomy and privacy. Because the available data is only exposed through custom APIs, profile knowledge cannot be efficiently exchanged, which is required to provide optimal care. Therefore, this paper presents a distributed data-driven platform, built on Semantic Web technologies, that enables the integration of all profile knowledge in order to deliver personalized continuous home care. It provides a distributed smart monitoring service, which allows to locally monitor only the relevant sensors according to the patient’s profile, and infer personalized decisions when analyzing events. Moreover, it provides a medical treatment planning service, which composes treatment plans tailored to each patient, including personalized quality of service parameters allowing a doctor to select the optimal plan. To illustrate how the platform delivers these services, the paper also presents a demonstrator using a realistic home care scenario.".
- 3487553.3524630 abstract "The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, whose expressivity and interpretability are limited. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting.".
- 3543873.3589756 abstract "Knowledge Graphs have become a foundation for sharing data on the web and building intelligent services across many sectors and also within some of the most successful corporations in the world. The over centralisation of data on the web, however, has been raised as a concern by a number of prominent researchers in the field. For example, at the beginning of 2022 a €2.7B civil lawsuit was launched against Meta on the basis that it has abused its market dominance to impose unfair terms and conditions on UK users in order to exploit their personal data. Data centralisation can lead to a number of problems including: lock-in/siloing effects, lack of user control over their personal data, limited incentives and opportunities for interoperability and openness, and the resulting detrimental effects on privacy and innovation. A number of diverse approaches and technologies exist for decentralising data, such as federated querying and distributed ledgers. The main question is, though, what does decentralisation really mean for web data and Knowledge Graphs? What are the main issues and tradeoffs involved? These questions and others are addressed in this workshop.".
- 150696 abstract "In this paper, the design of a service-oriented architecture for multi-sensor surveillance in smart homes is presented as an integrated solution enabling automatic deployment, dynamic selection and composition of sensors. Sensors are implemented as web-connected devices, with a uniform Web API. RESTdesc is used to describe the sensors and a novel solution is presented to automatically compose Web APIs that can be applied with existing Semantic Web reasoners. We evaluated the solution by building a smart Kinect sensor that is able to dynamically switch between IR and RGB and optimizing person detection by incorporating feedback from pressure sensors, as such demonstrating the collaboration among sensors to enhance detection of complex events. The performance results show that the platform scales for many Web APIs as composition time remains limited to a few hundred milliseconds in almost all cases.".
- CSIS151028031D abstract "Recent developments on sharing research results and ideas on the Web, such as research collaboration platforms like Mendeley or ResearchGate, enable novel ways to explore research information. Current search interfaces in this field focus mostly on narrowing down the search scope through faceted search, keyword matching, or filtering. The interactive visual aspect and the focus on exploring relationships between items in the results has not sufficiently been addressed before. To facilitate this exploration, we developed ResXplorer, a search interface that interactively visualizes linked data of research-related sources. By visualizing resources such as conferences, publications and proceedings, we reveal relationships between researchers and those resources. We evaluate our search interface by measuring how it affects the search productivity of targeted lean users. Furthermore, expert users reviewed its information retrieval potential and compared it against both popular academic search engines and highly specialized academic search interfaces. The results indicate how well lean users perceive the system and expert users rate it for its main goal: revealing relationships between resources for researchers.".
- 978-1-61499-562-3-37 abstract "Various methods are needed to extract information from current (digital) comics. Furthermore, the use of different (proprietary) formats by comic distribution platforms causes an overhead for authors. To overcome these issues, we propose a solution that makes use of the EPUB 3 specification, additionally leveraging the Open Web Platform to support animations, reading assistance, audio and multiple languages in a single format, by using our JavaScript library comicreader.js. We also provide administrative and descriptive metadata in the same format by introducing a new ontology: Dicera. Our solution is complementary to the current extraction methods, on the one hand because they can help with metadata creation, and on the other hand because the machine-understandable metadata alleviates their use. While the reading system support for our solution is currently limited, it can offer all features needed by current comic distribution platforms. When comparing comics generated by our solution to EPUB 3 textbooks, we observed an increase in file size, mainly due to the use of images. In future work, our solution can be further improved by extending the presentation features, investigating different types of comics, studying the use of new EPUB 3 extensions, and by incorporating it in digital book authoring environments.".
- SW-180319 abstract "When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODIGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODIGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODIGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.".
- SW-190358 abstract "Knowledge graphs, which contain annotated descriptions of entities and their interrelations, are often generated using rules that apply semantic annotations to certain data sources. (Re)using ontology terms without adhering to the axioms defined by their ontologies results in inconsistencies in these graphs, affecting their quality. Methods and tools were proposed to detect and resolve inconsistencies, the root causes of which include rules and ontologies. However, these either require access to the complete knowledge graph, which is not always available in a time-constrained situation, or assume that only generation rules can be refined but not ontologies. In the past, we proposed a rule-driven method for detecting and resolving inconsistencies without complete knowledge graph access, but it requires a predefined set of refinements to the rules and does not guide users with respect to the order the rules should be inspected. We extend our previous work with a rule-driven method, called Resglass, that considers refinements for generation rules as well as ontologies. In this article, we describe Resglass, which includes a ranking to determine the order with which rules and ontology elements should be inspected, and its implementation. The ranking is evaluated by comparing the manual ranking of experts to our automatic ranking. The evaluation shows that our automatic ranking achieves an overlap of 80% with experts ranking, reducing this way the effort required during the resolution of inconsistencies in both rules and ontologies.".
- SW-200384 abstract "The correct functioning of Semantic Web applications requires that given RDF graphs adhere to an expected shape. This shape depends on the RDF graph and the application’s supported entailments of that graph. During validation, RDF graphs are assessed against sets of constraints, and found violations help refining the RDF graphs. However, existing validation approaches cannot always explain the root causes of violations (inhibiting refinement), and cannot fully match the entailments supported during validation with those supported by the application. These approaches cannot accurately validate RDF graphs, or combine multiple systems, deteriorating the validator’s performance. In this paper, we present an alternative validation approach using rule-based reasoning, capable of fully customizing the used inferencing steps. We compare to existing approaches, and present a formal ground and practical implementation “Validatrr”, based on N3Logic and the EYE reasoner. Our approach – supporting an equivalent number of constraint types compared to the state of the art – better explains the root cause of the violations due to the reasoner’s generated logical proof, and returns an accurate number of violations due to the customizable inferencing rule set. Performance evaluation shows that Validatrr is performant for smaller datasets, and scales linearly w.r.t. the RDF graph size. The detailed root cause explanations can guide future validation report description specifications, and the fine-grained level of configuration can be employed to support different constraint languages. This foundation allows further research into handling recursion, validating RDF graphs based on their generation description, and providing automatic refinement suggestions.".
- SW-210449 abstract "Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a bidirectional delta chain instead of the default unidirectional delta chain. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions.".
- SW-210450 abstract "The quality of knowledge graphs can be assessed by a validation against specified constraints, typically use-case specific and modeled by human users in a manual fashion. Visualizations can improve the modeling process as they are specifically designed for human information processing, possibly leading to more accurate constraints, and in turn higher quality knowledge graphs. However, it is currently unknown how such visualizations support users when viewing RDF constraints as no scientific evidence for the visualizations’ effectiveness is provided. Furthermore, some of the existing tools are likely suboptimal, as they lack support for edit operations or common constraints types. To establish a baseline, we have defined visual notations to represent RDF constraints and implemented them in UnSHACLed, a tool that is independent of a concrete RDF constraint language. In this paper, we (i) present two visual notations that support all SHACL core constraints, built upon the commonly used visualizations VOWL and UML, (ii) analyze both notations based on cognitive effective design principles, (iii) perform a comparative user study between both visual notations, and (iv) present our open source tool UnSHACLed incorporating our efforts. Users were presented RDF constraints in both visual notations and had to answer questions based on visualization task taxonomies. Although no statistical significant difference in mean error rates was observed, all study participants preferred ShapeVOWL in a self assessment to answer RDF constraint-related questions. Furthermore, ShapeVOWL adheres to more cognitive effective design principles according to our performed comparison. Study participants argued that the increased visual features of ShapeVOWL made it easier to spot constraints, but a list of constraints – as in ShapeUML – is easier to read. However, also that more deviations from the strict UML specification and introduction of more visual features can improve ShapeUML. From these findings we conclude that ShapeVOWL has a higher potential to represent RDF constraints more effective compared to ShapeUML. But also that the clear and efficient text encoding of ShapeUML can be improved with visual features. A one-size-fits-all approach to RDF constraint visualization and editing will be insufficient. Therefore, to support different audiences and use cases, user interfaces of RDF constraint editors need to support different visual notations.".
- SW-222945 abstract "A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way. Such compositions can be managed by Dependency Injection (DI), a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them. Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files, such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics, which limits their usage in other contexts. Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows. As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data. We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications, providing global semantics via Linked Data-based configuration files. In this article, we report on the Components.js framework by explaining its architecture and configuration, and discuss its impact by mentioning where and how applications use it. We show that Components.js is a stable framework that has seen significant uptake during the last couple of years. We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others, or applying these configurations in other contexts such as experimentation or static program analysis. We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication.".
- SW-223116 abstract "Publishing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic data further increases the computational burden for data publishers, resulting in often unavailable historical data and live schedule updates for most public transport networks. In this paper we apply and extend the current Linked Connections approach for static data to also support cost-efficient live and historical public transport data publishing on the Web. Our contributions include (i) a reference specification and system architecture to support cost-efficient publishing of dynamic public transport schedules and historical data; (ii) empirical evaluations on route planning query performance based on data fragmentation size, publishing costs and a comparison with a traditional route planning engine such as OpenTripPlanner; (iii) an analysis of potential correlations of query performance with particular public transport network characteristics such as size, average degree, density, clustering coefficient and average connection duration. Results confirm that fragmentation size influences route planning query performance and converges on an optimal fragment size per network. Size (stops), density and connection duration also show correlation with route planning query performance. Our approach proves to be more cost-efficient and in some cases outperforms OpenTripPlanner when supporting the earliest arrival time route planning use case. Moreover, the cost of publishing live and historical schedules remains in the same order of magnitude for server-side resources compared to publishing planned schedules only. Yet, further optimizations are needed for larger networks (> 1000 stops) to be useful in practice. Additional dataset fragmentation strategies (e.g. geospatial) may be studied for designing more scalable and performant Web API s that adapt to particular use cases, not only limited to the public transport domain.".
- SW-233396 abstract "In many industries, multiple parties collaborate on a larger project. At the same time, each of those stakeholders participates in multiple independent projects simultaneously. A double patchwork can thus be identified, with a many-to-many relationship between actors and collaborative projects. One key example is the construction industry, where every project is unique, involving specialists for many subdomains, ranging from the architectural design over technical installations to geospatial information, governmental regulation and sometimes even historical research. A digital representation of this process and its outcomes requires semantic interoperability between these subdomains, which however often work with heterogeneous and unstructured data. In this paper we propose to address this double patchwork via a decentralized ecosystem for multi-stakeholder, multi-industry collaborations dealing with heterogeneous information snippets. At its core, this ecosystem, called ConSolid, builds upon the Solid specifications for Web decentralization, but extends these both on a (meta)data pattern level and on microservice level. To increase the robustness of data allocation and filtering, we identify the need to go beyond Solid’s current LDP-inspired interfaces to a Solid Pod and introduce the concept of metadata-generated ‘virtual views’, to be generated using an access-controlled SPARQL interface to a Pod. A recursive, scalable way to discover multi-vault aggregations is proposed, along with data patterns for connecting and aligning heterogeneous (RDF and non-RDF) resources across vaults in a mediatype-agnostic fashion. We demonstrate the use and benefits of the ecosystem using minimal running examples, concluding with the setup of an example use case from the Architecture, Engineering, Construction and Operations (AECO) industry.".
- info3040790 abstract "Metadata has been around and has evolved for centuries, albeit not recognized as such. Medieval manuscripts typically had illuminations at the start of each chapter, being both a kind of signature for the author writing the script and a pictorial chapter anchor for the illiterates at the time. Nowadays, there is so much fragmented information on the Internet that users sometimes fail to distinguish the real facts from some bended truth, let alone being able to interconnect different facts. Here, the metadata can both act as noise-reductors for detailed recommendations to the end-users, as it can be the catalyst to interconnect related information. Over time, metadata thus not only has had different modes of information, but furthermore, metadata’s relation of information to meaning, i.e., “semantics”, evolved. Darwin’s evolutionary propositions, from “species have an unlimited reproductive capacity”, over “natural selection”, to “the cooperation of mutations leads to adaptation to the environment” show remarkable parallels to both metadata’s different modes of information and to its relation of information to meaning over time. In this paper, we will show that the evolution of the use of (meta)data can be mapped to Darwin’s nine evolutionary propositions. As mankind and its behavior are products of an evolutionary process, the evolutionary process of metadata with its different modes of information is on the verge of a new-semantic-era.".
- isqv23n4.2011.04 abstract "Linked Data hold the promise to derive additional value from existing data throughout different sectors, but practitioners currently lack a straightforward methodology and the tools to experiment with Linked Data. This article gives a pragmatic overview of how general purpose Interactive Data Transformation tools (IDTs) can be used to perform the two essential steps to bring data into the Linked Data cloud: data cleaning and reconciliation. These steps are explained with the help of freely available data (Cooper-Hewitt National Design Museum, New York) and tools (Google Refine), making the process repeatable and understandable for practitioners.".
- IJSWIS.2016070103 abstract "Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed—even though it has a strong impact on selecting sources that contribute to the query results. Therefore, we introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, we identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for our concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness. With these findings, we conclude that using hypermedia for interface discovery brings us closer to a queryable global dataspace. However, clients require more intelligent methods to effectively consume such space.".
- IJSWIS.2017100108 abstract "When researchers formulate search queries to find relevant content on the Web, those queries typically consist of keywords that can only be matched in the content or its metadata. The Web of Data extends this functionality by bringing structure and giving well-defined meaning to the content and it enables humans and machines to work together using controlled vocabularies. Due the high degree of mismatches between the structure of the content and the vocabularies in different sources, searching over multiple heterogeneous repositories of structured data is considered challenging. Therefore, we present a semantic search engine for researchers facilitating search in research related Linked Data. To facilitate high-precision interactive search, we configured the engine to annotate and interlink structured research data with ontologies from various repositories in an effective semantic model. Furthermore, our system is adaptive as researchers can choose to add their social media accounts for synchronization and efficiently explore new datasets.".
- 0006733106710679 abstract "Modern developments confront us with an ever increasing amount of streaming data: different sensors in environments like hospitals or factories communicate their measurements to other applications. Having this data at disposal faces us with a new challenge: the data needs to be integrated to existing frameworks. As the availability of sensors can rapidly change, these need to be flexible enough to easily incorporate new systems without having to be explicitly configured. Semantic Web applications offer a solution for that enabling computers to “understand” data. But for them the pure amount of data and different possible queries which can be performed on it can form an obstacle. This paper tackles this problem: we present a formalism to describe stream queries in the ontology context in which they might become relevant. These descriptions enable us to automatically decide based on the actual setting and the problem to be solved which and how sensors should be monitored further. This helps us to limit the streaming data taken into account for reasoning tasks and make stream reasoning more performant. We illustrate our approach on a health-care use case where different sensors are used to measure data on patients and their surrounding in a hospital.".
- jwe1540-9589.2045 abstract "Public transit operators often publish their open data in a data dump, but developers with limited computational resources may not have the means to process all this data efficiently. In our prior work we have shown that geospatially partitioning an operator’s network can improve query times for client-side route planning applications by a factor of 2.4. However, it remains unclear whether this works for all network types, or other kinds of applications. To answer these questions, we must evaluate the same method on more networks and analyze the effect of geospatial partitioning on each network separately. In this paper we process three networks in Belgium: (i) the national railways, (ii) the regional operator in Flanders, and (iii) the network of the city of Brussels, using both real and artificially generated query sets. Our findings show that on the regional network, we can make query processing 4 times more efficient, but we could not improve the performance over the city network by more than 12%. Both the network’s topography, and to a lesser extent how users interact with the network, determine how suitable the network is for partitioning. Thus, we come to a negative answer to our question: our method does not work equally well for all networks. Moreover, since the network’s topography is the main determining factor, we expect this finding to apply to other graph-based geospatial data, as well as other Link Traversal-based applications.".
- decentralized-footpaths abstract "Users expect route planners that combine all modes of transportation to propose good journeys to their destination. These route planners use data from several sources such as road networks and schedule-based public transit. We focus on the link between the two; specifically, the walking distances between stops. Research in this field so far has found that computing these paths dynamically is too slow, but that computing all of them results in a quadratically scaling number of edges which is prohibitively expensive in practice. The common solution is to cluster the stops into small unconnected graphs, but this restricts the amount of walking and has a significant impact on the travel times. Moreover, clustering operates on a closed-world assumption, which makes it impractical to add additional public transit services. A decentralized publishing strategy that fixes these issues should thus (i) scale gracefully with the number of stops; (ii) support unrestricted walking; (iii) make it easy to add new services and (iv) support splitting the work among several actors. We introduce a publishing strategy that is based on the Delaunay triangulation of public transit stops, where every triangle edge corresponds to a single footpath that is precomputed. This guarantees that all stops are reachable from one another, while the number of precomputed paths increases linearly with the number of stops. Each public transit service can be processed separately, and combining several operators can be done with a minimal amount of work. Approximating the walking distance with a path along the triangle edges overestimates the actual distance by 20% on average. Our results show that our approach is a middle-ground between completeness and practicality. It consistently overestimates the walking distances, but this seems workable since overestimating the time needed to catch a connection is arguably better than recommending an impossible journey. The estimates could still be improved by combining the great-circle distance with our approximations. Alternatively, different triangulations could be combined to create a more complete graph.".
- EYE.pdf abstract "This issue’s installment examines a software program reasoning about the world’s largest knowledge source. Ruben Verborgh and Jos De Roo describe how a small open source project can have a large impact. This is the fourth open source product discussed in the Impact department and the first written in the logic programming language Prolog.".
- icwe2020-main-track abstract "Web-based information services transformed how we interact with public transport. Discovering alternatives to reach destinations and obtaining live updates about them is necessary to optimize journeys and improve the quality of travellers’ experience. However, keeping travellers updated with opportune information is demanding. Traditional Web APIs for live public transport data follow a polling approach and allocate all data processing on either data providers, lowering data accessibility, or data consumers, increasing the costs of innovative solutions. Moreover, data processing load increases further because previously obtained route plans are fully recalculated when live updates occur. In between solutions sharing processing load between clients and servers, and alternative Web API architectures were not thoroughly investigated yet. We study performance trade-offs of polling and push-based Web architectures to efficiently publish and consume live public transport data. We implement (i) alternative architectures that allow sharing data processing load between clients and servers, and evaluate their performance following polling- and push-based approaches; (ii) a rollback mechanism that extends the Connection Scan Algorithm to avoid unnecessary full route plan recalculations upon live updates. Evaluations show polling as a more efficient alternative on CPU and RAM but hint towards push-based alternatives when bandwidth is a concern. Clients update route plan results 8–10 times faster with our rollback approach. Smarter API design combining polling and push-based Web interfaces for live public transport data reduces the intrinsic costs of data sharing by equitably distributing the processing load between clients and servers. Future work can investigate more complex multimodal transport scenarios.".
- iswc2021-in-use abstract "The European Union Agency for Railways acts as an European authority, tasked with the provision of a legal and technical framework to support harmonized and safe cross-border railway operations throughout the EU. So far, the agency relied on traditional application-centric approaches to support the data exchange among multiple actors interacting within the railway domain. This lead however, to isolated digital environments that consequently added barriers to digital interoperability while increasing the cost of maintenance and innovation. In this work, we show how Semantic Web technologies are leveraged to create a semantic layer for data integration across the base registries maintained by the agency. We validate the usefulness of this approach by supporting route compatibility checks, a highly demanded use case in this domain, which was not available over the agency’s registries before. Our contributions include (i) an official ontology for the railway infrastructure and authorized vehicle types, including 28 reference datasets; (ii) a reusable Knowledge Graph describing the European railway infrastructure; (iii) a cost-efficient system architecture that enables high-flexibility for use case development; and (iv) an open source and RDF native Web application to support route compatibility checks. This work demonstrates how data-centric system design, powered by Semantic Web technologies and Linked Data principles, provides a framework to achieve data interoperability and unlock new and innovative use cases and applications. Based on the results obtained during this work, ERA officially decided to make Semantic Web and Linked Data-based approaches, the default setting for any future development of the data, registers and specifications under the agency’s remit for data exchange mandated by the EU legal framework. The next steps, which are already underway, include further developing and bringing these solutions to a production-ready state.".
- describing-experiments abstract "Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions. Reproducing such experiments involves setting up software, benchmarks, and test data. Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility, such as module and dependency version numbers or the configuration of individual components in different setups. To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration. This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web. In order to enable semantic interlinking between configurations and modules, we published the metadata of all 480,000+ JavaScript libraries on npm as 194,000,000+ RDF triples. Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups. This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations. In the future, software could be instantiated automatically based on these descriptions and configurations, reasoning and querying can be applied to software configurations for meta-research purposes.".
- cs-105 abstract "While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.".
- cs-110 abstract "Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.".
- cs-78 abstract "Publication and archival of scientific results is still commonly considered the responsibility of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable.".
- article-demo abstract "The collection of Linked Data is evergrowing and many datasets are frequently being updated. In order to fully exploit the potential of the information that is available in and over historical dataset versions, we need to be able to store and query Linked Datasets efficiently. In this demonstration, we introduce OSTRICH, which is an efficient multi-version triple store with versioned querying support. We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created. Using this interface, the user is able to querying in, between and over different versions, ingest new versions, and retrieve summarizing statistics.".
- article-iswc2019-journal-ostrich abstract "In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.".
- article-jws2018-ostrich abstract "When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.".
- QuWeDa_2023_Link_Queue_Analysis_Final.pdf abstract "Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective.".
- article-ldtraversal-security-short abstract "The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization.".
- article-quoted-triples-index abstract "The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner.".
- Article-ESWC2024-DFDP abstract "Forms are key to bidirectional communication on the Web: without them, end-users would be unable to place online orders or file support tickets. Organizations often need multiple, highly similar forms, which currently require multiple implementations. Moreover, the data is tightly coupled to the application, restricting the end-user from reusing it with other applications, or storing the data somewhere else. Organizations and end-users have a need for a technique to create forms that are more controllable, reusable, and decentralized. To address this problem, we introduce the Declarative Form Description Pipeline (DFDP) that meets these requirements. DFDP achieves controllability through end-users’ editable declarative form descriptions. Reusability for organizations is ensured through descriptions of the form fields and associated actions. Finally, by leveraging a decentralized environment like Solid, the application is decoupled from the storage, preserving end-user control over their data. In this paper, we introduce and explain how such a declarative form description can be created and used without assumptions about the viewing environment or data storage. We show how separate applications can interoperate and be interchanged by using a description that contains details for form rendering and data submission decisions using a form, policy, and rule ontology. Furthermore, we prove how this approach solves the shortcomings of traditional Web forms. Our proposed pipeline enables organizations to save time by building similar forms without starting from scratch. Similarly, end-users can save time by letting machines prefill the form with existing data. Additionally, DFDP empowers end-users to be in control of the application they use to manage their data in a data store. User study results provide insights to further improve usability by providing automatic suggestions based on field labels entered.".
- WhatsInAPod abstract "The Solid vision aims to make data independent of applications through technical specifications, which detail how to publish and consume permissioned data across multiple autonomous locations called “pods". The current document-centric interpretation of Solid, wherein a pod is solely a hierarchy of Linked Data documents, cannot fully realize this envisaged independence. Applications are left to define their own APIs within the Solid Protocol, which leads to fundamental interoperability problems and the need for associated workarounds. The broader longterm vision for Solid is confounded with the concrete HTTP interface to pods today, leading to a narrower solution space to address these core issues. We examine the mismatch between the vision and the prevalent document-centric interpretation, and propose a reconciliatory graph-centric interpretation wherein a pod is fundamentally a knowledge graph. In this article, we contrast the existing and proposed interpretations in terms of how they support the Solid vision. We argue that the knowledge-centric interpretation can improve pod access through different Web APIs that act as views in a database sense. We show that our zoomed-out interpretation provides improved opportunities for storage, publication, and querying of decentralized data in more flexible and sustainable ways. These insights are crucial to reduce the dependency of Solid apps on implicit API semantics and local assumptions about the shape and organization of data and the resulting performance. The suggested broader interpretation can guide Solid through its evolution into a heterogeneous yet interoperable ecosystem that accommodates a multitude of read/write data access patterns.".