Matches in Ruben’s data for { ?s <http://schema.org/abstract> ?o }
- bxt147 abstract "In this paper, we report on the task of near-duplicate photo detection in the context of events that get shared on multiple social networks. When people attend events, they more and more share event-related photos publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content so that it can be used to summarize events visually in the form of media galleries or slideshows. Therefore, methods for the deduplication of near-duplicate photos of the same event are required in order to ensure the diversity of the generated media galleries or slideshows. First, we introduce the social-network-specific reasons and challenges that cause near-duplicate photos. Second, we introduce an algorithm for the task of deduplicating near-duplicate photos stemming from social networks. Finally, we evaluate the algorithm’s results and shortcomings".
- fqt067 abstract "Unstructured metadata fields such as “description” offer tremendous value for users to understand cultural heritage objects. However, this type of narrative information is of little direct use within a machine-readable context due to its unstructured nature. This paper explores the possibilities and limitations of Named-Entity Recognition (NER) and Term Extraction (TE) to mine such unstructured metadata for meaningful concepts. These concepts can be used to leverage otherwise limited searching and browsing operations, but they can also play an important role to foster Digital Humanities research. In order to catalyze experimentation with NER and TE, the paper proposes an evaluation of the performance of three third-party entity extraction services through a comprehensive case study, based on the descriptive fields of the Smithsonian Cooper-Hewitt National Design Museum in New York. In order to cover both NER and TE, we first offer a quantitative analysis of named-entities retrieved by the services in terms of precision and recall compared to a manually annotated gold-standard corpus, then complement this approach with a more qualitative assessment of relevant terms extracted. Based on the outcomes of this double analysis, the conclusions present the added value of entity extraction services, but also indicate the dangers of uncritically using NER and/or TE, and by extension Linked Data principles, within the Digital Humanities. All metadata and tools used within the paper are freely available, making it possible for researchers and practitioners to repeat the methodology. By doing so, the paper offers a significant contribution towards understanding the value of entity recognition and disambiguation for the Digital Humanities.".
- JD-03-2017-0040 abstract "The purpose of this paper is to detail a low-cost, low-maintenance publishing strategy aimed at unlocking the value of Linked Data collections held by libraries, archives and museums (LAMs). The shortcomings of commonly used Linked Data publishing approaches are identified, and the current lack of substantial collections of Linked Data exposed by LAMs is considered. To improve on the discussed status quo, a novel approach for publishing Linked Data is proposed and demonstrated by means of an archive of DBpedia versions, which is queried in combination with other Linked Data sources. The authors show that the approach makes publishing Linked Data archives easy and affordable, and supports distributed querying without causing untenable load on the Linked Data sources. The proposed approach significantly lowers the barrier for publishing, maintaining, and making Linked Data collections queryable. As such, it offers the potential to substantially grow the distributed network of queryable Linked Data sources. Because the approach supports querying without causing unacceptable load on the sources, the queryable interfaces are expected to be more reliable, allowing them to become integral building blocks of robust applications that leverage distributed Linked Data sources. The novel publishing strategy significantly lowers the technical and financial barriers that LAMs face when attempting to publish Linked Data collections. The proposed approach yields Linked Data sources that can reliably be queried, paving the way for applications that leverage distributed Linked Data sources through federated querying.".
- JD-07-2013-0098 abstract "The paper revisits a decade after its conception the Representational State Transfer (REST) architectural style and analyses its relevance to address current challenges from the Library and Information Science (LIS) discipline. Conceptual aspects of REST are reviewed and a generic architecture to support REST is presented. The relevance of the architecture is demonstrated with the help of a case-study based on the collection registration database of the Cooper-Hewitt National Design Museum. We argue that the “resources and representations” model of REST is a sustainable way for the management of Web resources in a context of constant technological evolutions. When making information resources available on the Web, a resource-oriented publishing model can avoid the costs associated with the creation of multiple interfaces. This paper re-examines the conceptual merits of REST and translates the architecture into actionable recommendations for the LIS discipline.".
- BigData.2016.7840981 abstract "Topic Modelling (TM) has gained momentum over the last few years within the humanities to analyze topics represented in large volumes of full text. This paper proposes an experiment with the usage of TM based on a large subset of digitized archival holdings of the European Commission (EC). Currently, millions of scanned and OCR’ed files are available and hold the potential to significantly change the way historians of the construction and evolution of the European Union can perform their research. However, due to a lack of resources, only minimal metadata are available on a file and document level, seriously undermining the accessibility of this archival collection. The article explores in an empirical manner the possibilities and limits of TM to automatically extract key concepts from a large body of documents spanning multiple decades. By mapping the topics to headings of the EUROVOC thesaurus, the proof of concept described in this paper offers the future possibility to represent the identified topics with the help of a hierarchical search interface for end-users.".
- COMPSACW.2013.29 abstract "This paper describes the mechanisms involved in accessing provenance on the Web, according to the new W3C PROV specifications, and how end-users can process this information to make basic trust assessments. Additionally, we illustrate this principle by implementing a practical use case, namely Tim Berners-Lee’s vision of the “Oh, yeah?” button, enabling users to make trust assessments about documents on the web. This implementation leverages the W3C PROV specification to provide user-friendly access to the provenance of Web pages. While the extension described in this paper is specific to one browser, the majority of its components are browser-agnostic.".
- FITCE53297.2021.9588540 abstract "The Web is evolving more and more to a small set of walled gardens where a very small number of platforms determine the way that people get access to the Web (i.e. “log in via Facebook”). As a pushback against this ongoing centralization of data and services on the Web, decentralization efforts are taking place that move data into personal data vaults. This positioning paper discusses a potential personal data vault society and the required research steps in order to get there. Emphasis is given on the needed interplay between technological and business research perspectives. The focus is on the situation of Flanders, based on a recent announcement of the Flemish government to set up a Data Utility Company. The concepts as well as the suggested path for adoption can easily be extended, however, to other situations/regions as well.".
- ICSC.2016.55 abstract "In this paper, we propose and investigate a novel distance-based approach for measuring the semantic dissimilarity between two concepts in a knowledge graph. The proposed Normalized Semantic Web Distance (NSWD) extends the idea of the Normalized Web Distance, which is utilized to determine the dissimilarity between two textural terms, and utilizes additional semantic properties of nodes in a knowledge graph. We evaluate our proposal on two different knowledge graphs: Freebase and DBpedia. While the NSWD achieves a correlation of up to 0.58 with human similarity assessments on the established Miller-Charles benchmark of 30 term-pairs on the Freebase knowledge graph, it reaches an even higher correlation of 0.69 in the DBpedia knowledge graph. We thus conclude that the proposed NSWD is an efficient and effective distance-based approach for assessing semantic dissimilarity in very large knowledge graphs.".
- IOT.2014.7030116 abstract "We present an approach that combines semantic metadata and reasoning with a visual modeling tool to enable the goal-driven configuration of smart environments for end users. In contrast to process-driven systems where service mashups are statically defined, this approach makes use of embedded semantic API descriptions to dynamically create mashups that fulfill the user’s goal. The main advantage of the presented system is its high degree of flexibility, as service mashups can adapt to dynamic environments and are fault-tolerant with respect to individual services becoming unavailable. To support end users in expressing their goals, we integrated a visual programming tool with our system. This tool enables users to model the desired state of their smart environment graphically and thus hides the technicalities of the underlying semantics and the reasoning. Possible applications of the presented system include the configuration of smart homes to increase individual well-being, and reconfigurations of smart environments, for instance in the industrial automation or healthcare domains.".
- MC.2014.296 abstract "Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.".
- NWeSP.2011.6088202 abstract "The social networking website Facebook offers to its users a feature called “status updates” (or just “status”), which allows them to create microposts directed to all their contacts, or a subset thereof. Readers can respond to microposts, or in addition to that also click a “Like" button to show their appreciation for a certain micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such microposts can, for example, be achieved via Natural Language Processing (NLP). Therefore, we have implemented a RESTful mash-up NLP API based on a combination of several third party NLP APIs in order to retrieve more accurate results in the sense of emergence. In consequence, our API uses third party APIs opaquely in the background in order to deliver its output. In this paper, we describe how one can keep track of provenance, and credit back the contributions of each single API to the combined result of all APIs. In addition to that, we show how the existence of provenance metadata can help understand the way a combined result is formed, and optimize the result combination process. Therefore, we use the HTTP Vocabulary in RDF and the Provenance Vocabulary. The main contribution of our work is a description of how provenance metadata can be automatically added to the output of mash-up APIs like the one presented in this paper.".
- NWeSP.2011.6088208 abstract "Hyperlinks and forms let humans navigate with ease through websites they have never seen before. In contrast, automated agents can only perform preprogrammed actions on Web services, reducing their generality and restricting their usefulness to a specialized domain. Many of the employed services call themselves RESTful, although they neglect the hypermedia constraint as defined by Roy T. Fielding, stating that the application state should be driven by hypertext. This lack of link usage on the Web of services severely limits agents in what they can do, while connectedness forms a primary feature of the human Web. An urgent need for more intelligent agents becomes apparent, and in this paper, we demonstrate how the conjunction of functional service descriptions and hypermedia links leads to advanced, interactive agent behavior. We propose a new mode for our previously introduced semantic service description format RESTdesc, providing the mechanisms for agents to consume Web services based on links, similar to human browsing strategies. We illustrate the potential of these descriptions by a use case that shows the enhanced capabilities they offer to automated agents, and explain how this is vital for the future Web.".
- QoMEX.2012.6263875 abstract "In this paper, we present and define aesthetic principles for the automatic generation of media galleries based on media items retrieved from social networks that—after a ranking and pruning step—can serve to authentically summarize events and their atmosphere from a visual and an audial standpoint.".
- TASE.2016.2533321 abstract "One of the central research challenges in the Internet of Things and Ubiquitous Computing domains is how users can be enabled to “program” their personal and industrial smart environments by combining services that are provided by devices around them. We present a service composition system that enables the goal-driven configuration of smart environments for end users by combining semantic metadata and reasoning with a visual modeling tool. In contrast to process-driven approaches where service mashups are statically defined, we make use of embedded semantic API descriptions to dynamically create mashups that fulfill the user’s goal. The main advantage of our system is its high degree of flexibility, as service mashups can adapt to dynamic environments and are fault-tolerant with respect to individual services becoming unavailable. To support users in expressing their goals, we integrated a visual programming tool with our system that allows to model the desired state of a smart environment graphically, thereby hiding the technicalities of the underlying semantics. Possible applications of the presented system include the management of smart homes to increase individual well-being, and reconfigurations of smart environments, for instance in the industrial automation or healthcare domains.".
- 2307819.2307828 abstract "The early visions for the Semantic Web, from the famous 2001 Scientific American article by Berners-Lee et al., feature intelligent agents that can autonomously perform tasks like discovering information, scheduling events, finding execution plans for complex operations, and in general, use reasoning techniques to come up with sense-making and traceable decisions. While today—more than ten years later—the building blocks (1) resource-oriented rest infrastructure, (2) Web APIs, and (3) Linked Data are in place, the envisioned intelligent agents have not landed yet. In this paper, we explain why capturing functionality is the connection between those three building blocks, and introduce the functional API description format RESTdesc that creates this bridge between hypermedia APIs and the Semantic Web. Rather than adding yet another component to the Semantic Web stack, RESTdesc offers instead concise descriptions that reuse existing vocabularies to guide hypermedia-driven agents. Its versatile capabilities are illustrated by a real-life agent use case for Web browsers wherein we demonstrate that RESTdesc functional descriptions are capable of fulfilling the promise of autonomous agents on the Web.".
- 2487788.2488182 abstract "Hypermedia links and controls drive the Web by transforming information into affordances through which users can choose actions. However, publishers of information cannot predict all actions their users might want to perform and therefore, hypermedia can only serve as the engine of application state to the extent the user’s intentions align with those envisioned by the publisher. In this paper, we introduce distributed affordance, a concept and architecture that extends application state to the entire Web. It combines information inside the representation and knowledge of action providers to generate affordance from the user’s perspective. Unlike similar approaches such as Web Intents, distributed affordance scales both in the number of actions and the number of action providers, because it is resource-oriented instead of action-oriented. A proof-of-concept shows that distributed affordance is a feasible strategy on today’s Web.".
- 2806416.2806642 abstract "In order to assess the trustworthiness of information on social media, a consumer needs to understand where this information comes from, and which processes were involved in its creation. The entities, agents and activities involved in the creation of a piece of information are referred to as its provenance, which was standardized by W3C PROV. However, current social media APIs cannot always capture the full lineage of every message, leaving the consumer with incomplete or missing provenance, which is crucial for judging the trust it carries. Therefore in this paper, we propose an approach to reconstruct the provenance of messages on social media on multiple levels. To obtain a fine-grained level of provenance, we use an existing approach to reconstruct information cascades with high certainty, and map them to PROV using the PROV-SAID extension for social media. To obtain a coarse-grained level of provenance, we adapt a similarity-based, fuzzy provenance reconstruction approach – previously applied on news. We illustrate our approach by providing the reconstructed provenance of a limited social media dataset gathered during the 2012 Olympics, for which we were able to reconstruct a significant amount of previously unidentified connections.".
- 2814864.2814873 abstract "The RDF data model allows the description of domain-level knowledge that is understandable by both humans and machines. RDF data can be derived from different source formats and diverse access points, ranging from databases or files in CSV format to data retrieved from Web APIs in JSON, Web Services in XML or any other speciality formats. To this end, vocabularies such as RML were introduced to uniformly define how data in multiple heterogeneous sources is mapped to the RDF data model, independently of their original format. This approach results in mapping definitions that are machine-processable and interoperable. However, the way in which this data is accessed and retrieved still remains hard-coded, as corresponding descriptions are often not available or not taken into account. In this paper, we introduce an approach that takes advantage of widely-accepted vocabularies, originally used to advertise services or datasets, such as Hydra or DCAT, to define how to access Web-based or other data sources. Consequently, the generation of RDF representations is facilitated, as the description of the interaction models with the original data remains independent, interoperable and granular.".
- 2814864.2814892 abstract "A lot of Linked Data on the Web is dynamic. Despite the existing query solutions and implementations, crucial unresolved issues remain. This poster presents a novel approach to update SPARQL results client-side by exchanging fragment patches. We aim at a sustainable solution which balances the load and reduces bandwidth. Therefore, our approach benefits from reusing unchanged data and minimizing data transfer size. By only working with patches, the load on the server is minimal. Also, the bandwidth usage is low, since only relevant changes are transferred to the client.".
- 2872518.2891069 abstract "In the field of smart cities, researchers need an indication of how people move in and between cities. Yet, getting statistics of travel flows within public transit systems has proven to be troublesome. In order to get an indication of public transit travel flows in Belgium, we analyzed the query logs of the iRail API, a highly expressive route planning API for the Belgian railways. We were able to study ∼100k to 500k requests for each month between October 2012 and November 2015, which is between 0.56% and 1.66% of the amount of monthly passengers. Using data visualizations, we illustrate the commuting patterns in Belgium and confirm that Brussels, the capital, acts as a central hub. The Flemish region appears to be polycentric, while in the Walloon region, everything converges on Brussels. The findings correspond to the real travel demand, according to experts of the passenger federation Trein Tram Bus. We conclude that query logs of route planners are of high importance in getting an indication of travel flows. However, better travel intentions would be acquirable using dedicated HTTP POST requests.".
- 2910019.2910027 abstract "Public administrations are often still organised in vertical, closed silos. The lack of common data standards (common data models and reference data) for exchanging information between administrations in a cross-domain and/or cross-border setting stands in the way of digital public services and automated flow of information between public administrations. Core data models address this issue, but are often created within the closed environment of a country or region and within one policy domain. A lack of insight exists in understanding and managing the life-cycle of these initiatives on public administration information systems for data modelling and data exchange. In this paper, we outline state-of-the-art implementations and vocabularies linked to the core data models. In particular we inventoried and selected existing core data models and identified tendencies in current practices based on the criteria creation, use, maintenance and coordination. Based on this, this survey suggest policy and information management recommendations pointing to best practices regarding core data model implementations and their role in linking isolated data silos. Finally we highlight the differences in their coordination and maintenance, depending on the creation and use.".
- 2910019.2910030 abstract "Travelers expect access to tourism information at anytime, anywhere, with any media. Mobile tourism guides, accessible via the Web, provide an omnipresent approach to this. Thereby it is expensive and not trivial to (re)model, translate and transform data over and over. This inhibits many players, including governments, in developing such applications. We report on our experience in running a project on mobile tourism in Flanders, Belgium where we develop a methodology and reusable formalization for the data disclosure. We apply open data standards to achieve a reusable and interoperable datahub for mobile tourism. We organized working groups resulting in a re-usable formal specification and serialization of the domain model that is immediately usable for building mobile tourism applications. This increased the awareness and lead to semantic convergence which is forming a regional foundation to develop sustainable mobile guides for tourism.".
- 2928294.2928304 abstract "Linked Data storage solutions often optimize for low latency querying and quick responsiveness. Meanwhile, in the back-end, offline ETL processes take care of integrating and preparing the data. In this paper we explain a workflow and the results of a benchmark that examines which Linked Data storage solution and setup should be chosen for different dataset sizes to optimize the cost-effectiveness of the entire ETL process. The benchmark executes diversified stress tests on the storage solutions. The results include an in-depth analysis of four mature Linked Data solutions with commercial support and full SPARQL 1.1 compliance. Whereas traditional benchmarks studies generally deploy the triple stores on premises using high-end hardware, this benchmark uses publicly available cloud machine images for reproducibility and runs on commodity hardware. All stores are tested using their default configuration. In this setting Virtuoso shows the best performance in general. The other tree stores show competitive results and have disjunct areas of excellence. Finally, it is shown that each store’s performance heavily depends on the structural properties of the queries, giving an indication of where vendors can focus their optimization efforts.".
- 3014087.3014096 abstract "Each government level uses their own different information system. At the same time citizens expect a user-centric approach and instant access to their data or to open government data. Therefore the applications at various government levels need to be interoperable in support of the “once-only principle”: data is inputted and registered only once and then reused. Given government budget constraints and it being an expensive and no trivial task to (re)model, translate and transform data over and over, public administrations need to reduce interoperability costs. This is achieved by semantically aligning the information between the different information systems of each government level. Semantically interoperable systems facilitate citizen-centered e-government services. This paper illustrates how the OSLO program paved the way bottom-up from a broad basis of stakeholders towards a government-endorsed strategy. OSLO applied a more generic process and methodology and provide practical insights on how to overcome the encountered hurdles: political support and adoption; reaching semantic agreement. The lessons learned in the region of Flanders, Belgium can speed-up the process in other countries that face the complexity of integrating information intensive processes between different applications, administrations and government levels.".
- 3041021.3051699 abstract "On dense railway networks—such as in Belgium—train travelers are frequently confronted with overly occupied trains, especially during peak hours. Crowdedness on trains leads to a deterioration in the quality of service and has a negative impact on the well-being of the passenger. In order to stimulate travelers to consider less crowded trains, the iRail project wants to show an occupancy indicator in their route planning applications by the means of predictive modelling. As there is no official occupancy data available, training data is gathered by crowd sourcing using the Web app iRail.be and the Railer application for iPhone. Users can indicate their departure & arrival station, at what time they took a train and classify the occupancy of that train into the classes: low, medium or high. While preliminary results on a limited data set conclude that the models do not yet perform sufficiently well, we are convinced that with further research and a larger amount of data, our predictive model will be able to achieve higher predictive performances. All datasets used in the current research are, for that purpose, made publicly available under an open license on the iRail website and in the form of a Kaggle competition. Moreover, an infrastructure is set up that automatically processes new logs submitted by users in order for our model to continuously learn. Occupancy predictions for future trains are made available through an API.".
- 3041021.3054210 abstract "A large amount of public transport data is made available by many different providers, which makes RDF a great method for integrating these datasets. Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data. These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets. In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data. In this paper, we introduce a dataset generator with the capability to create realistic public transport data. This dataset generator, and the ability to configure it on different levels, makes it easier to use public transport data for benchmarking with great flexibility.".
- 3148011.3154467 abstract "While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing. While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients, they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces. In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls. We therefore explored techniques to represent this parameterized response structure in a generic but expressive way. In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce. Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure, as it conforms to the REST architectural style that has helped shape the Web into its current form.".
- 3184558.3190666 abstract "Structured Knowledge on the Web had an intriguing history before it has become successful. We briefly revisit this history, before we go into the longer discussion about how structured knowledge on the Web should be devised such that it benefits even more applications. Core to this discussion will be issues like trust, information infrastructure usability and resilience, promising realms of structured knowledge and principles and practices of data sharing.".
- 3184558.3191650 abstract "For smart decision making, user agents need real-time and historic access to open data from sensors installed in the public domain. In contrast to a closed environment, for Open Data and federated query processing algorithms, the data publisher cannot anticipate in advance on specific questions, nor can it deal with a bad cost-efficiency of the server interface when data consumers increase. When publishing observations from sensors, different fragmentation strategies can be thought of depending on how the historic data needs to be queried. Furthermore, both publish/subscribe and polling strategies exist to publish real-time updates. Each of these strategies come with their own trade-offs regarding cost-efficiency of the server-interface, user-perceived performance and CPU use. A polling strategy where multiple observations are published in a paged collection was tested in a proof of concept for parking space availabilities. In order to understand the different resource trade-offs presented by publish/subscribe and polling publication strategies, we devised an experiment on two machines, for a scalability test. The preliminary results were inconclusive and suggest more large scale tests are needed in order to see a trend. While the large-scale tests will be performed in future work, the proof of concept helped to identify the technical Open Data principles for the 13 biggest cities in Flanders.".
- 3308560.3316520 abstract "For better traffic flow and making better policy decisions, the city of Antwerp is connecting traffic lights to the Internet. The live “time to green” only tells a part of the story: also the historical values need to be preserved and need to be made accessible to everyone. We propose (i) an ontology for describing the topology of an intersection and the signal timing of traffic lights, (ii) a specification to publish these historical and live data with Linked Data Fragments and (iii) a method to preserve the published data in the long-term. We showcase the applicability of our specification with the opentrafficlights.org project where an end-user can see the live count-down as well as a line chart showing the historic “time to green” of a traffic light. We found that publishing traffic lights data as time sorted Linked Data Fragments allow synchronizing and reusing an archive to retrieve historical observations. Long-term preservation with tape storage becomes feasible when archives shift from byte preservation to knowledge preservation by combining Linked Data Fragments.".
- 3323878.3325802 abstract "To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.".
- 3366424.3383528 abstract "In healthcare, the aging of the population is resulting in a gradual shift from residential care to home care, requiring reliable follow-up of elderly people by a whole network of care providers. The environment of these patients is increasingly being equipped with different monitoring devices, which allow to obtain insight into the current condition of patient & environment. However, current monitoring platforms that support care providers are centralized and not personalized, reducing performance, scalability, autonomy and privacy. Because the available data is only exposed through custom APIs, profile knowledge cannot be efficiently exchanged, which is required to provide optimal care. Therefore, this paper presents a distributed data-driven platform, built on Semantic Web technologies, that enables the integration of all profile knowledge in order to deliver personalized continuous home care. It provides a distributed smart monitoring service, which allows to locally monitor only the relevant sensors according to the patient’s profile, and infer personalized decisions when analyzing events. Moreover, it provides a medical treatment planning service, which composes treatment plans tailored to each patient, including personalized quality of service parameters allowing a doctor to select the optimal plan. To illustrate how the platform delivers these services, the paper also presents a demonstrator using a realistic home care scenario.".
- 3487553.3524630 abstract "The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, whose expressivity and interpretability are limited. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting.".
- 150696 abstract "In this paper, the design of a service-oriented architecture for multi-sensor surveillance in smart homes is presented as an integrated solution enabling automatic deployment, dynamic selection and composition of sensors. Sensors are implemented as web-connected devices, with a uniform Web API. RESTdesc is used to describe the sensors and a novel solution is presented to automatically compose Web APIs that can be applied with existing Semantic Web reasoners. We evaluated the solution by building a smart Kinect sensor that is able to dynamically switch between IR and RGB and optimizing person detection by incorporating feedback from pressure sensors, as such demonstrating the collaboration among sensors to enhance detection of complex events. The performance results show that the platform scales for many Web APIs as composition time remains limited to a few hundred milliseconds in almost all cases.".
- CSIS151028031D abstract "Recent developments on sharing research results and ideas on the Web, such as research collaboration platforms like Mendeley or ResearchGate, enable novel ways to explore research information. Current search interfaces in this field focus mostly on narrowing down the search scope through faceted search, keyword matching, or filtering. The interactive visual aspect and the focus on exploring relationships between items in the results has not sufficiently been addressed before. To facilitate this exploration, we developed ResXplorer, a search interface that interactively visualizes linked data of research-related sources. By visualizing resources such as conferences, publications and proceedings, we reveal relationships between researchers and those resources. We evaluate our search interface by measuring how it affects the search productivity of targeted lean users. Furthermore, expert users reviewed its information retrieval potential and compared it against both popular academic search engines and highly specialized academic search interfaces. The results indicate how well lean users perceive the system and expert users rate it for its main goal: revealing relationships between resources for researchers.".
- 978-1-61499-562-3-37 abstract "Various methods are needed to extract information from current (digital) comics. Furthermore, the use of different (proprietary) formats by comic distribution platforms causes an overhead for authors. To overcome these issues, we propose a solution that makes use of the EPUB 3 specification, additionally leveraging the Open Web Platform to support animations, reading assistance, audio and multiple languages in a single format, by using our JavaScript library comicreader.js. We also provide administrative and descriptive metadata in the same format by introducing a new ontology: Dicera. Our solution is complementary to the current extraction methods, on the one hand because they can help with metadata creation, and on the other hand because the machine-understandable metadata alleviates their use. While the reading system support for our solution is currently limited, it can offer all features needed by current comic distribution platforms. When comparing comics generated by our solution to EPUB 3 textbooks, we observed an increase in file size, mainly due to the use of images. In future work, our solution can be further improved by extending the presentation features, investigating different types of comics, studying the use of new EPUB 3 extensions, and by incorporating it in digital book authoring environments.".
- SW-180319 abstract "When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODIGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODIGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODIGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.".
- SW-190358 abstract "Knowledge graphs, which contain annotated descriptions of entities and their interrelations, are often generated using rules that apply semantic annotations to certain data sources. (Re)using ontology terms without adhering to the axioms defined by their ontologies results in inconsistencies in these graphs, affecting their quality. Methods and tools were proposed to detect and resolve inconsistencies, the root causes of which include rules and ontologies. However, these either require access to the complete knowledge graph, which is not always available in a time-constrained situation, or assume that only generation rules can be refined but not ontologies. In the past, we proposed a rule-driven method for detecting and resolving inconsistencies without complete knowledge graph access, but it requires a predefined set of refinements to the rules and does not guide users with respect to the order the rules should be inspected. We extend our previous work with a rule-driven method, called Resglass, that considers refinements for generation rules as well as ontologies. In this article, we describe Resglass, which includes a ranking to determine the order with which rules and ontology elements should be inspected, and its implementation. The ranking is evaluated by comparing the manual ranking of experts to our automatic ranking. The evaluation shows that our automatic ranking achieves an overlap of 80% with experts ranking, reducing this way the effort required during the resolution of inconsistencies in both rules and ontologies.".
- SW-200384 abstract "The correct functioning of Semantic Web applications requires that given RDF graphs adhere to an expected shape. This shape depends on the RDF graph and the application’s supported entailments of that graph. During validation, RDF graphs are assessed against sets of constraints, and found violations help refining the RDF graphs. However, existing validation approaches cannot always explain the root causes of violations (inhibiting refinement), and cannot fully match the entailments supported during validation with those supported by the application. These approaches cannot accurately validate RDF graphs, or combine multiple systems, deteriorating the validator’s performance. In this paper, we present an alternative validation approach using rule-based reasoning, capable of fully customizing the used inferencing steps. We compare to existing approaches, and present a formal ground and practical implementation “Validatrr”, based on N3Logic and the EYE reasoner. Our approach – supporting an equivalent number of constraint types compared to the state of the art – better explains the root cause of the violations due to the reasoner’s generated logical proof, and returns an accurate number of violations due to the customizable inferencing rule set. Performance evaluation shows that Validatrr is performant for smaller datasets, and scales linearly w.r.t. the RDF graph size. The detailed root cause explanations can guide future validation report description specifications, and the fine-grained level of configuration can be employed to support different constraint languages. This foundation allows further research into handling recursion, validating RDF graphs based on their generation description, and providing automatic refinement suggestions.".
- info3040790 abstract "Metadata has been around and has evolved for centuries, albeit not recognized as such. Medieval manuscripts typically had illuminations at the start of each chapter, being both a kind of signature for the author writing the script and a pictorial chapter anchor for the illiterates at the time. Nowadays, there is so much fragmented information on the Internet that users sometimes fail to distinguish the real facts from some bended truth, let alone being able to interconnect different facts. Here, the metadata can both act as noise-reductors for detailed recommendations to the end-users, as it can be the catalyst to interconnect related information. Over time, metadata thus not only has had different modes of information, but furthermore, metadata’s relation of information to meaning, i.e., “semantics”, evolved. Darwin’s evolutionary propositions, from “species have an unlimited reproductive capacity”, over “natural selection”, to “the cooperation of mutations leads to adaptation to the environment” show remarkable parallels to both metadata’s different modes of information and to its relation of information to meaning over time. In this paper, we will show that the evolution of the use of (meta)data can be mapped to Darwin’s nine evolutionary propositions. As mankind and its behavior are products of an evolutionary process, the evolutionary process of metadata with its different modes of information is on the verge of a new-semantic-era.".
- isqv23n4.2011.04 abstract "Linked Data hold the promise to derive additional value from existing data throughout different sectors, but practitioners currently lack a straightforward methodology and the tools to experiment with Linked Data. This article gives a pragmatic overview of how general purpose Interactive Data Transformation tools (IDTs) can be used to perform the two essential steps to bring data into the Linked Data cloud: data cleaning and reconciliation. These steps are explained with the help of freely available data (Cooper-Hewitt National Design Museum, New York) and tools (Google Refine), making the process repeatable and understandable for practitioners.".
- IJSWIS.2016070103 abstract "Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed—even though it has a strong impact on selecting sources that contribute to the query results. Therefore, we introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, we identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for our concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness. With these findings, we conclude that using hypermedia for interface discovery brings us closer to a queryable global dataspace. However, clients require more intelligent methods to effectively consume such space.".
- IJSWIS.2017100108 abstract "When researchers formulate search queries to find relevant content on the Web, those queries typically consist of keywords that can only be matched in the content or its metadata. The Web of Data extends this functionality by bringing structure and giving well-defined meaning to the content and it enables humans and machines to work together using controlled vocabularies. Due the high degree of mismatches between the structure of the content and the vocabularies in different sources, searching over multiple heterogeneous repositories of structured data is considered challenging. Therefore, we present a semantic search engine for researchers facilitating search in research related Linked Data. To facilitate high-precision interactive search, we configured the engine to annotate and interlink structured research data with ontologies from various repositories in an effective semantic model. Furthermore, our system is adaptive as researchers can choose to add their social media accounts for synchronization and efficiently explore new datasets.".
- 0006733106710679 abstract "Modern developments confront us with an ever increasing amount of streaming data: different sensors in environments like hospitals or factories communicate their measurements to other applications. Having this data at disposal faces us with a new challenge: the data needs to be integrated to existing frameworks. As the availability of sensors can rapidly change, these need to be flexible enough to easily incorporate new systems without having to be explicitly configured. Semantic Web applications offer a solution for that enabling computers to “understand” data. But for them the pure amount of data and different possible queries which can be performed on it can form an obstacle. This paper tackles this problem: we present a formalism to describe stream queries in the ontology context in which they might become relevant. These descriptions enable us to automatically decide based on the actual setting and the problem to be solved which and how sensors should be monitored further. This helps us to limit the streaming data taken into account for reasoning tasks and make stream reasoning more performant. We illustrate our approach on a health-care use case where different sensors are used to measure data on patients and their surrounding in a hospital.".
- jwe1540-9589.2045 abstract "Public transit operators often publish their open data in a data dump, but developers with limited computational resources may not have the means to process all this data efficiently. In our prior work we have shown that geospatially partitioning an operator’s network can improve query times for client-side route planning applications by a factor of 2.4. However, it remains unclear whether this works for all network types, or other kinds of applications. To answer these questions, we must evaluate the same method on more networks and analyze the effect of geospatial partitioning on each network separately. In this paper we process three networks in Belgium: (i) the national railways, (ii) the regional operator in Flanders, and (iii) the network of the city of Brussels, using both real and artificially generated query sets. Our findings show that on the regional network, we can make query processing 4 times more efficient, but we could not improve the performance over the city network by more than 12%. Both the network’s topography, and to a lesser extent how users interact with the network, determine how suitable the network is for partitioning. Thus, we come to a negative answer to our question: our method does not work equally well for all networks. Moreover, since the network’s topography is the main determining factor, we expect this finding to apply to other graph-based geospatial data, as well as other Link Traversal-based applications.".
- decentralized-footpaths abstract "Users expect route planners that combine all modes of transportation to propose good journeys to their destination. These route planners use data from several sources such as road networks and schedule-based public transit. We focus on the link between the two; specifically, the walking distances between stops. Research in this field so far has found that computing these paths dynamically is too slow, but that computing all of them results in a quadratically scaling number of edges which is prohibitively expensive in practice. The common solution is to cluster the stops into small unconnected graphs, but this restricts the amount of walking and has a significant impact on the travel times. Moreover, clustering operates on a closed-world assumption, which makes it impractical to add additional public transit services. A decentralized publishing strategy that fixes these issues should thus (i) scale gracefully with the number of stops; (ii) support unrestricted walking; (iii) make it easy to add new services and (iv) support splitting the work among several actors. We introduce a publishing strategy that is based on the Delaunay triangulation of public transit stops, where every triangle edge corresponds to a single footpath that is precomputed. This guarantees that all stops are reachable from one another, while the number of precomputed paths increases linearly with the number of stops. Each public transit service can be processed separately, and combining several operators can be done with a minimal amount of work. Approximating the walking distance with a path along the triangle edges overestimates the actual distance by 20% on average. Our results show that our approach is a middle-ground between completeness and practicality. It consistently overestimates the walking distances, but this seems workable since overestimating the time needed to catch a connection is arguably better than recommending an impossible journey. The estimates could still be improved by combining the great-circle distance with our approximations. Alternatively, different triangulations could be combined to create a more complete graph.".
- EYE.pdf abstract "This issue’s installment examines a software program reasoning about the world’s largest knowledge source. Ruben Verborgh and Jos De Roo describe how a small open source project can have a large impact. This is the fourth open source product discussed in the Impact department and the first written in the logic programming language Prolog.".
- icwe2020-main-track abstract "Web-based information services transformed how we interact with public transport. Discovering alternatives to reach destinations and obtaining live updates about them is necessary to optimize journeys and improve the quality of travellers’ experience. However, keeping travellers updated with opportune information is demanding. Traditional Web APIs for live public transport data follow a polling approach and allocate all data processing on either data providers, lowering data accessibility, or data consumers, increasing the costs of innovative solutions. Moreover, data processing load increases further because previously obtained route plans are fully recalculated when live updates occur. In between solutions sharing processing load between clients and servers, and alternative Web API architectures were not thoroughly investigated yet. We study performance trade-offs of polling and push-based Web architectures to efficiently publish and consume live public transport data. We implement (i) alternative architectures that allow sharing data processing load between clients and servers, and evaluate their performance following polling- and push-based approaches; (ii) a rollback mechanism that extends the Connection Scan Algorithm to avoid unnecessary full route plan recalculations upon live updates. Evaluations show polling as a more efficient alternative on CPU and RAM but hint towards push-based alternatives when bandwidth is a concern. Clients update route plan results 8–10 times faster with our rollback approach. Smarter API design combining polling and push-based Web interfaces for live public transport data reduces the intrinsic costs of data sharing by equitably distributing the processing load between clients and servers. Future work can investigate more complex multimodal transport scenarios.".
- iswc2021-in-use abstract "The European Union Agency for Railways acts as an European authority, tasked with the provision of a legal and technical framework to support harmonized and safe cross-border railway operations throughout the EU. So far, the agency relied on traditional application-centric approaches to support the data exchange among multiple actors interacting within the railway domain. This lead however, to isolated digital environments that consequently added barriers to digital interoperability while increasing the cost of maintenance and innovation. In this work, we show how Semantic Web technologies are leveraged to create a semantic layer for data integration across the base registries maintained by the agency. We validate the usefulness of this approach by supporting route compatibility checks, a highly demanded use case in this domain, which was not available over the agency’s registries before. Our contributions include (i) an official ontology for the railway infrastructure and authorized vehicle types, including 28 reference datasets; (ii) a reusable Knowledge Graph describing the European railway infrastructure; (iii) a cost-efficient system architecture that enables high-flexibility for use case development; and (iv) an open source and RDF native Web application to support route compatibility checks. This work demonstrates how data-centric system design, powered by Semantic Web technologies and Linked Data principles, provides a framework to achieve data interoperability and unlock new and innovative use cases and applications. Based on the results obtained during this work, ERA officially decided to make Semantic Web and Linked Data-based approaches, the default setting for any future development of the data, registers and specifications under the agency’s remit for data exchange mandated by the EU legal framework. The next steps, which are already underway, include further developing and bringing these solutions to a production-ready state.".
- Article-System-Components abstract "A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way. Such compositions can be managed by Dependency Injection (DI), a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them. Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files, such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics, which limits their usage in other contexts. Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows. As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data. We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications, providing global semantics via Linked Data-based configuration files. In this article, we report on the Components.js framework by explaining its architecture and configuration, and discuss its impact by mentioning where and how applications use it. We show that Components.js is a stable framework that has seen significant uptake during the last couple of years. We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others, or applying these configurations in other contexts such as experimentation or static program analysis. We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication.".
- describing-experiments abstract "Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions. Reproducing such experiments involves setting up software, benchmarks, and test data. Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility, such as module and dependency version numbers or the configuration of individual components in different setups. To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration. This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web. In order to enable semantic interlinking between configurations and modules, we published the metadata of all 480,000+ JavaScript libraries on npm as 194,000,000+ RDF triples. Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups. This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations. In the future, software could be instantiated automatically based on these descriptions and configurations, reasoning and querying can be applied to software configurations for meta-research purposes.".
- cs-105 abstract "While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.".
- cs-110 abstract "Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.".
- cs-78 abstract "Publication and archival of scientific results is still commonly considered the responsibility of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable.".
- article-demo abstract "The collection of Linked Data is evergrowing and many datasets are frequently being updated. In order to fully exploit the potential of the information that is available in and over historical dataset versions, we need to be able to store and query Linked Datasets efficiently. In this demonstration, we introduce OSTRICH, which is an efficient multi-version triple store with versioned querying support. We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created. Using this interface, the user is able to querying in, between and over different versions, ingest new versions, and retrieve summarizing statistics.".
- article-iswc2019-journal-ostrich abstract "In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.".
- article-jws2018-ostrich abstract "When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.".
- article-ldtraversal-security-short abstract "The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization.".
- WhatsInAPod abstract "The Solid vision aims to make data independent of applications through technical specifications, which detail how to publish and consume permissioned data across multiple autonomous locations called “pods". The current document-centric interpretation of Solid, wherein a pod is solely a hierarchy of Linked Data documents, cannot fully realize this envisaged independence. Applications are left to define their own APIs within the Solid Protocol, which leads to fundamental interoperability problems and the need for associated workarounds. The broader longterm vision for Solid is confounded with the concrete HTTP interface to pods today, leading to a narrower solution space to address these core issues. We examine the mismatch between the vision and the prevalent document-centric interpretation, and propose a reconciliatory graph-centric interpretation wherein a pod is fundamentally a knowledge graph. In this article, we contrast the existing and proposed interpretations in terms of how they support the Solid vision. We argue that the knowledge-centric interpretation can improve pod access through different Web APIs that act as views in a database sense. We show that our zoomed-out interpretation provides improved opportunities for storage, publication, and querying of decentralized data in more flexible and sustainable ways. These insights are crucial to reduce the dependency of Solid apps on implicit API semantics and local assumptions about the shape and organization of data and the resulting performance. The suggested broader interpretation can guide Solid through its evolution into a heterogeneous yet interoperable ecosystem that accommodates a multitude of read/write data access patterns.".