Ruben’s data

Matches in Ruben’s data for { ?s <http://schema.org/abstract> ?o }

Showing triples 101 to 200 of ±359 with 100 triples per page.

first
previous
next

publication abstract "Making Linked Data queryable on the Web is not an easy task for publishers, for technical and logistical reasons. Can they afford to offer a SPARQL endpoint, or should they offer an API or data dump instead? And what technical knowledge is needed for that? This demo presents a user-friendly pipeline to compose APIs for Linked Datasets, consisting of a customizable set of reusable features, e.g., Triple Pattern Fragments, substring search, membership metadata, etc. These APIs indicate their supported features in hypermedia responses, so that clients can discover which server-provided functionality they understand, and divide the evaluation of SPARQL queries accordingly between client and server. That way, publishers can determine the complexity of the resulting API, and thus the maximal set of server tasks. This demo shows how publishers can easily set up an API with this pipeline, and demonstrates the client-side execution of federated SPARQL queries against such APIs.".
publication abstract "We will show that semantically annotated paths lead to discovering meaningful, non-trivial relations and connections between multiple resources in large online datasets such as the Web of Data. Graph algorithms have always been key in pathfinding applications (e.g., navigation systems). They make optimal use of available computation resources to find paths in structured data. Applying these algorithms to Linked Data can facilitate the resolving of complex queries that involve the semantics of the relations between resources. In this paper, we introduce a new approach for finding paths in Linked Data that takes into account the meaning of the connections and also deals with scalability. An efficient technique combining pre-processing and indexing of datasets is used for finding paths between two resources in large datasets within a couple of seconds. To demonstrate our approach, we have implemented a testcase using the DBpedia dataset.".
publication abstract "Social media, web collaboration tools, co-authorship and citation networks are typically not associated. The various ways of interacting with this kind of data for scientific and research purposes remain distinct. In this paper, we propose a solution to bring such information together and visualize it. We particularly developed an exploratory visualization of research networks as a combination of co-authorship, citations and social media interactions. The result is a scholar centered, multi-perspective view of conferences and people based on their collaborations and online interactions. We measured the relevance and user acceptance of this type of interactive visualization. Preliminary results indicate a high precision both for recognized people and conferences. The majority in a group of test-users responded positively to a set of statements about the acceptance.".
publication abstract "When building reliable data-driven applications for local governments to interact with public servants or citizens, data publishers and consumers have to be sure that the applied data structure and schema definition are accurate and lead to reusable data. To understand the characteristics of reusable local government data, we motivate how the process of developing a semantically enriched exchange standard contributes to resolving this issue. This standard is used, for example, to describe contact information for public services which supports a representative pilot for opening up a variety of local government data. Our main finding after implementing this pilot is that converging on semantics and specifying detailed use of the model that forms the core of such a standard leads to reusable government data.".
publication abstract "There exists an abundance of Linked Data storage solutions, but only few meet the requirements of a production environment with interlinked life sciences data. In such environments, a triple store has to support complex SPARQL queries and handle large datasets with hundreds of millions of triples. The Ontoforce platform DisQover offers federated search for life sciences, relying on complex federated queries over open life science data, run in an ETL-pipeline to anticipate user actions in its exploratory search interface. Different state-of-the-art approaches for scaling out are compared, both in terms of their ability to execute the queries as in terms of ETL pipeline performance. This paper analyzes and discusses the features of the datasets and query mixes. An in-depth analysis is provided on an individual query basis revealing the strengths and weaknesses with respect to certain query types.".
publication abstract "Handling Big Data is often hampered by integration challenges, especially when speed, efficiency and accuracy are critical. In Semantic Web, integration is achieved by aligning multiple representations of the same entities, appearing within distributed heterogeneous sources and by annotating data using same vocabularies. However, mapping different sources to RDF separately, generating multiple URIs for distinct instances of same entities, hampers advanced and automated data fusion and prevents benefiting from data semantic representation. We introduce a unified knowledge framework, created already at mapping level, to enable a successful business ecosystem built on top of enriched and combined datasets, proving that the integration of data from multiple heterogeneous sources, produces added value.".
publication abstract "Despite the significant number of existing tools, incorporating data into the Linked Open Data cloud remains complicated; hence discouraging data owners to publish their data as Linked Data. Unlocking the semantics of published data, even if they are not provided by the data owners, can contribute to surpass the barriers posed by the low availability of Linked Data and come closer to the realisation of the envisaged Semantic Web. RML, a generic mapping language based on an extension over R2RML, the W3C standard for mapping relational databases into RDF, offers a uniform way of defining the mapping rules for data in heterogeneous formats. In this paper, we present how we adjusted our prototype RML Processor, taking advantage of RML’s scalability, to extract and map data of workshop proceedings published in HTML to the RDF data model for the Semantic Publishing Challenge needs.".
publication abstract "Linked Data visualizations present a rather static view of entities and their relations, despite the semantically rich data. In contrast to the spatial dimension, which is well-exploited, the temporal dimension is neglected. In this demo, we address the temporal dimension, enabling time-dependant visualizations, applied to research networks based on semantically annotated research metadata. Such visualizations can reveal insightful information otherwise not able to be observed.".
publication abstract "RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment to the publishing workflow. Adjustments are manually—but rarely—applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from semi-structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to different cases, e.g., large, crowdsourced datasets as DBpedia, or newly generated, as iLastic. Our evaluation indicates the efficiency of our workflow, as it improves significantly the overall quality of an RDF dataset in the observed cases.".
publication abstract "RDF dataset quality assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., CSV, XML), the root of the violations often derives from the mappings that specify how the RDF dataset will be generated. Thus, we suggest shifting the quality assessment from the RDF dataset to the mapping definitions that generate it. The proposed test-driven approach for assessing mappings relies on RDFUnit test cases applied over mappings specified with RML. Our evaluation is applied to different cases, e.g., DBpedia, and indicates that the overall quality of an RDF dataset is quickly and significantly improved.".
publication abstract "The root of schema violations for RDF data generated from (semi-)structured data, often derives from mappings, which are repeatedly applied and specify how an RDF dataset is generated. The DBpedia dataset, which derives from Wikipedia infoboxes, is no exception. To mitigate the violations, we proposed in previous work to validate the mappings which generate the data, instead of validating the generated data afterwards. In this work, we demonstrate how mappings validation is applied to DBpedia. DBpedia mappings are automatically translated to RML and validated by RDFUnit. The DBpedia mappings assessment can be frequently executed, because it requires significantly less time compared to validating the dataset. The validation results become available via a user-friendly interface. The DBpedia community takes them into consideration to refine the DBpedia mappings or ontology and thus, increase the dataset quality.".
publication abstract "Linked Data generation and publication remain challenging and complicated, in particular for data owners who are not Semantic Web experts or tech-savvy. The situation deteriorates when data from multiple heterogeneous sources, accessed via different interfaces, is integrated, and the Linked Data generation is a long-lasting activity repeated periodically, often adjusted and incrementally enriched with new data. Therefore, we propose the RMLWorkbench, a graphical user interface to support data owners administrating their Linked Data generation and publication workflow. The RMLWorkbench’s underlying language is RML, since it allows to declaratively describe the complete Linked Data generation workflow. Thus, any Linked Data generation workflow specified by a user can be exported and reused by other tools interpreting RML.".
publication abstract "Despite the significant number of existing tools, incorporating data from multiple sources and different formats into the Linked Open Data cloud remains complicated. No mapping formalisation exists to define how to map such heterogeneous sources into RDF in an integrated and interoperable fashion. This paper introduces the RML mapping language, a generic language based on an extension over R2RML, the W3C standard for mapping relational databases into RDF. Broadening RML’s scope, the language becomes source-agnostic and extensible, while facilitating the definition of mappings of multiple heterogeneous sources. This leads to higher integrity within datasets and richer interlinking among resources.".
publication abstract "Provenance and other metadata are essential for determining ownership and trust. Nevertheless, no systematic approaches were introduced so far in the Linked Data publishing workflow to capture them. Defining such metadata remained independent of the RDF data generation and publishing. In most cases, metadata is manually defined by the data publishers (person-agents), rather than produced by the involved applications (software-agents). Moreover, the generated RDF data and the published one are considered to be one and the same, which is not always the case, leading to pure, condense and often seductive information. This paper introduces an approach that takes into consideration declarative definitions of mapping rules, which define how the RDF data is generated, and data descriptions of raw data that allow to automatically and incrementally generate provenance and metadata information. This way, it is assured that the metadata information is accurate, consistent and complete.".
publication abstract "Generating Linked Data remains a complicated and intensive engineering process. While different factors determine how a Linked Data generation algorithm is designed, potential alternatives for each factor are currently not considered when designing the tools’ underlying algorithms. Certain design patterns are frequently applied across different tools, covering certain alternatives of a few of these factors, whereas other alternatives are never explored. Consequently, there are no adequate tools for Linked Data generation for certain occasions, or tools with inadequate and inefficient algorithms are chosen. In this position paper, we determine such factors, based on our experiences, and present a preliminary list. These factors could be considered when a Linked Data generation algorithm is designed or a tool is chosen. We investigated which factors are covered by widely known Linked Data generation tools and concluded that only certain design patterns are frequently encountered. By these means, we aim to point out that Linked Data generation is above and beyond bare implementations, and algorithms need to be thoroughly and systematically studied and exploited.".
publication abstract "Data science increasingly employs cloud-based Web application programming interfaces (APIs) stored in different repositories. However, discovering and connecting suitable APIs by sifting through these repositories for a given application, is difficult due to the lack of rich metadata needed to precisely describe the service and lack of explicit knowledge about the structure and datatypes of Web API inputs and outputs. To address this challenge, we conducted a survey to identify the metadata elements that are crucial to the description of Web APIs and subsequently developed a smartAPI metadata specification that includes 54 API metadata elements divided into five categories: (i) API Metadata, (ii) Service Provider Metadata, (iii) API Operation Metadata, (iv) Operation Parameter Metadata, (v) Operation Response Metadata. Then, we extended the widely used Swagger editor for annotating APIs, to develop a smartAPI editor that captures the APIs’ domain-related and structural characteristics using the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The smartAPI editor enables API developers to reuse existing metadata elements and values by automatically suggesting terms used by other APIs. In addition to making APIs more accessible and interoperable, we integrated the editor with a smartAPI profiler to annotate the API parameters and responses with semantic identifiers. Finally, the annotated APIs are published into a searchable API registry. The registry makes it easier to find, reuse and see how the different APIs are connected together so that complex workflows can be more easily made. Links to the specification, tool and registry are available at: http://smart-api.info/.".
publication abstract "This volume contains the proceedings of the Posters and Demos Track at the 11th International Conference on Semantic Systems (SEMANTiCS 2015). SEMANTiCS is the annual meeting place for professionals who make semantic computing work, who understand its benefits and encounter its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers and researchers from organisations ranging from research facilities, NPOs, through public administrations to the largest companies in the world.".
publication abstract "In this paper, we empirically investigate to what extent an EPUB 3-based e-TextBook can be used to facilitate a "first-class" mobile learning environment. To that end, we created an EPUB 3-based prototype e-TextBook that has been enhanced in terms of both its presentation and representation, meeting requirements that are typically ascribed by the literature to mobile learning environments. Specifically, we integrated three interactive widgets into our e-TextBook that are able to (1) exchange information with each other (inter-widget communication) and (2) that are able to semi-automatically create new content (that is, that are able to act as semi-automatic content providers): a report maker widget, a sine formula widget, and a corresponding interactive graph maker widget. In addition, we integrated different types of learning objects into our e-TextBook, including multimedia objects, objects with augmented reality features (i.e., digital objects that allow for interaction with physical objects), and objects that offer contextualized content. Both the widgets and the learning objects can be used within a unified learning environment. Furthermore, we semantically annotated the learning content in order to improve its discoverability.".
publication abstract "Scientific publications point to many associated resources, including videos, prototypes, slides, and datasets. However, discovering and accessing these resources is not always straightforward: links could be broken, readers may be offline, or the number of associated resources might make it difficult to keep track of the viewing order. In this paper, we explore potential integration of such resources into the digital version of a scientific publication. Specifically, we evaluate the most common scientific publication formats in terms of their capabilities to implement the desirable attributes of an enhanced publication and to meet the functional goals of an enhanced publication information system: PDF, HTML, EPUB2, and EPUB3. In addition, we present an EPUB3 version of an exemplary publication in the field of computer science, integrating and interlinking an explanatory video and an interactive prototype. Finally, we introduce a demonstrator that is capable of outputting customized scientific publications in EPUB3. By making use of EPUB3 to create an integrated and customizable representation of a scientific publication and its associated resources, we believe that we are able to augment the reading experience of scholarly publications, and thus the effectiveness of scientific communication.".
publication abstract "The Internet of Things (IoT) consists of many devices and services producing or consuming data over a network, and, by extension, the internet. There are various protocols and data models used by different vendors of things, or middleware, to expose data and APIs to communicate with and consume data of things. The Web of Things (WoT) addresses IoT’s fragmentation by forming a Web-based abstraction layer capable of interconnecting existing IoT platforms, devices, and cloud services and complementing available standards. Specifications of the Web of Things describe data and interaction models exposed to applications, and communication and security requirements for platforms to communicate effectively. At the core of the WoT specifications is the Thing Description, a semantic description of the data and interaction model(s) for a Thing. This helps other Things to perform actions on a Thing, e.g. read or write its properties (the data or state of a Thing). We believe that the WoT can benefit from semantically enhancing a Thing so it contains data values, or state, in the form of self-describing data. This allows powerful semantic processing and reasoning upon its state, and possibly the history of its states. To enable this, we propose a rule-based approach to generate self-describing data from a Thing’s state(s).".
publication abstract "Non-structured descriptive metadata provide additional benefits for end-user comprehension. However, their unstructured nature minimize their usefulness in an automated, digital context. This article explores the potential and the limits of Named Entity Recognition (NER) and Term Extraction (TE) in unstructured data searches in order to extract some meaningful concepts. These concepts allow us to benefit from improved retrieval and navigation, but they also play a very important role in digital humanities research. Using a case study to promote NER and TE experiments, based on the descriptive fields of the historical archives of Quebec City, the authors assess four third-party entity extractors. In an effort to address both NER and TE to assess named entities, they use a quantitative approach based on precision, recall and F-score calculated on the "gold standard corpus". A second more qualitative approach then leads us to consider the relevance of TE and to address the issue of multilingualism.".
publication abstract "Modeling domain knowledge as Linked Data is not straightforward for data publishers, because they are domain experts and not Semantic Web specialists. Most approaches that map data to its RDF representation still require users to have knowledge of the underlying implementations, as the mapping definitions remained, so far, tight to their execution. Defining mapping languages enables to decouple the mapping definitions from the implementation. However, user interfaces that enable domain experts to model knowledge and, thus, intuitively define such mapping definitions, based on available input sources, were not thoroughly investigated yet. This paper introduces a non-exhaustive list of desired features to be supported by such a mapping editor, independently of the underlying mapping language; and presents the RMLEditor as prototype interface that implements these features with RML as its underlying mapping language.".
publication abstract "Obtaining Linked Data by modeling domain-level knowledge derived from input data is not straightforward for data publishers, especially if they are not Semantic Web experts. Developing user interfaces that support domain experts to semantically annotate their data became feasible, as the mapping rules were abstracted from their execution. However, most existing approaches reflect how mappings are typically executed: they offer a single linear workflow, triggered by a particular data source. Alternative approaches were neither thoroughly investigated yet, nor incorporated in any of the existing user interfaces for mappings. In this paper, we generalize the two prevalent approaches for generating mappings of data in databases: database-driven and ontology-driven, to be applicable for any other data structure; and introduce two approaches: model-driven and result-driven.".
publication abstract "As the amount of generated sensor data is increasing, semantic interoperability becomes an important aspect in order to support efficient data distribution and communication. Therefore, the integration and fusion of (sensor) data is important, as this data is coming from different data sources and might be in different formats. Furthermore, reusable and extensible methods for this integration and fusion are required in order to be able to scale with the growing number of applications that generate semantic sensor data. Current research efforts allow to map sensor data to Linked Data in order to provide semantic interoperability. However, they lack support for multiple data sources, hampering the integration and fusion. Furthermore, the used methods are not available for reuse or are not extensible, which hampers the development of applications. In this paper, we describe how the RDF Mapping Language (RML) and a Triple Pattern Fragments (TPF) server are used to address these shortcomings. The demonstration consists of a micro controller that generates sensor data. The data is captured and mapped to RDF triples using module-specific RML mappings, which are queried from a TPF server.".
publication abstract "Generating Linked Data based on existing data sources requires the modeling of their information structure. This modeling needs the identification of potential entities, their attributes and the relationships between them and among entities. For databases this identification is not required, because a data schema is always available. However, for other data formats, such as hierarchical data, this is not always the case. Therefore, analysis of the data is required to support RDF term and data type identification. We introduce a tool that performs such an analysis on hierarchical data. It implements the algorithms, Daro and S-Daro, proposed in this paper. Based on our evaluation, we conclude that S-Daro offers a more scalable solution regarding run time, with respect to the dataset size, and provides more complete results.".
publication abstract "Linked Data can be generated by applying mapping rules on existing (semi-)structured data. The manual creation of these rules involves a costly process for users. Therefore, (semi-)automatic approaches have been developed to assist users. Although, they provide promising results, in use cases where examples of the desired Linked Data are available they do not use the knowledge provided by these examples, resulting in Linked Data that might not be as desired. This in turn requires manual updates of the rules. These examples can in certain cases be easy to create and offer valuable knowledge relevant for the mapping process, such as which data corresponds to entities and attributes, how this data is annotated and modeled, and how different entities are linked to each other. In this paper, we introduce a semi-automatic approach to create rules based on examples for both the existing data and corresponding Linked Data. Furthermore, we made the approach available via the RMLEditor, making it readily accessible for users through a graphical user interface. The proposed approach provides a first attempt to generate a complete Linked Dataset based on user-provided examples, by creating an initial set of rules for the users.".
publication abstract "In this talk we’ll explore Catmandu and Linked Data Fragments and how they can cooperate to build an environment for data stream processing at large.".
publication abstract "L’usage du traitement automatique des langues pour la classification et l’annotation documentaire reste aujourd’hui un rêve plus qu’une réalité. Pourtant, plus que jamais, les organisations font face à de grandes difficultés dans la gestion de leurs documents. Les vocabulaires contrôlés permettent certes d’organiser les contenus, mais toutes les organisations ne disposent pas de ressources suffisantes pour en implémenter. Au travers d’une étude de cas dans le secteur pharmaceutique, les auteurs de cet article démontrent comment une organisation établissement de taille réduite peut concevoir un vocabulaire contrôlé et indexer sémantiquement ses contenus et ce, sans dépendance vis-à-vis d’un fournisseur de logiciel grâce à des outils open source. Les données d’évaluation sont mises à disposition afin d’appliquer la méthodologie à d’autres domaines d’application.".
publication abstract "The organisers have asked Kjetil Kjernsmo to prepare an interview around his submission about an epistemological discussion around Semantic Web research. Together with one of the reviewers, Ruben Verborgh, he examined his contribution in this new format with success. The questions of the audience were answered during the succeeding discussion section.".
publication abstract "Museums around the world have built databases with meta-data about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the linked data cloud is difficult: the databases are large and complex, the information is richly structured and varies from museum to museum, and it is difficult to link the data to other datasets. We have been collaborating with the Smithsonian American Art Museum to create a set of tools that allow museums and other cultural heritage institutions to publish their data as Linked Open Data. In this demonstration we will show the end-to-end process of starting with the original source data, modeling the data with respect to a ontology of cultural heritage data, linking the data to DBpedia, and then publishing the information as Linked Open Data.".
publication abstract "Joint Proceedings of Workshops AI4LEGAL2020, NLIWOD, PROFILES 2020, QuWeDa 2020 and SEMIFORM2020 Colocated with the 19th International Semantic Web Conference (ISWC 2020)".
publication abstract "Statistics about constraints use in RDF data bring insights in common practices to address data quality. However, we only have such statistics for OWL axioms, not for constraint languages, such as SHACL or ShEx, that have recently become more popular. We extended previous work on axiom statistics to provide evidence of constraint type use. In this poster, we present preliminary statistics about the use of SHACL core constraints in data shapes found on GitHub. We found that class, datatype and cardinality constraints are predominantly used, similar to the dominant use of domain and range in ontologies. Less-used constraint types need further attention in visualization or modeling tools to address data quality issues. More constraints of SHACL but also ShEx need to be included to deepen the understanding. Data quality researchers and tool designers can make informed decisions based on the provided statistics.".
publication abstract "Within ontology engineering, concepts are modeled as classes and relationships, and restrictions as axioms. Reusing ontologies requires assessing if existing ontologies are suited for an application scenario. Different scenarios not only influence concept modeling, but also the use of different restriction types, such as subclass relationships or disjointness between concepts. However, metadata about the use of such restriction types is currently unavailable, preventing accurate assessments for reuse. We created the RDF Data Cube-based dataset MontoloStats, which contains restriction use statistics for 660 LOV and 565 BioPortal ontologies.We analyze the dataset and discuss the findings and their implications for ontology reuse. The MontoloStats dataset reveals that 94% of LOV and 95% of BioPortal ontologies use RDFS-based restriction types, 49% of LOV and 52% of BioPortal ontologies use at least one OWL-based restriction type, and different literal value-related restriction types are not or barely used. Our dataset provides modeling insights, beneficial for ontology reuse to discover and compare reuse candidates, but can also be the basis of new research that investigates novel ontology engineering methodologies with respect to restrictions definition.".
publication abstract "According to the Learning Analytics (LA) reference model, LA is used to collect, explore and analyze diverse types and interrelationships of data. Specifications like the Experience API (xAPI) work towards interoperability with respect to interrelationship of diverse learning data. Algorithms for adaptive learning could be improved by incorporation of user-related data, not present in learning activities. Linking these user-related data with learning activity data would fully exploit the potential of interrelationships with data. Conventional solutions, as well as current Linked Data-based solutions focus purely on learning activity data, whereas solutions based on Linked Data could be used to integrate data of different domains. We propose a provenance-aware pipeline to transform xAPI learning activity statements to Linked Data. The integration of learning activities with other user data, provides a more complete set of user data, improving an adaptive learning analytics system. We use the proposed pipeline to build a Linked Learning Record Store based on the Resource Description Framework (RDF). SPARQL queries are used to link data about learning activities, enriched with fine-grained exercise descriptions, with data describing the abilities of users. In this paper, we show how Linked Data can be generated from xAPI statements in a streaming approach, based on existing tools and interfaces. Our solution demonstrates the usage of Linked Data to combine learning activity data with user ability data, to get a more complete set of user data aiming to assist in adaptive learning.".
publication abstract "The quality of knowledge graphs can be assessed by a validation against specified constraints, typically use-case specific and modeled by human users in a manual fashion. Visualizations can improve the modeling process as they are specifically designed for human information processing, possibly leading to more accurate constraints, and in turn higher quality knowledge graphs. However, it is currently unknown how such visualizations support users when viewing RDF constraints as no scientific evidence for the visualizations’ effectiveness is provided. Furthermore, some of the existing tools are likely suboptimal, as they lack support for edit operations or common constraints types. To establish a baseline, we have defined visual notations to represent RDF constraints and implemented them in UnSHACLed, a tool that is independent of a concrete RDF constraint language. In this paper, we (i) present two visual notations that support all SHACL core constraints, built upon the commonly used visualizations VOWL and UML, (ii) analyze both notations based on cognitive effective design principles, (iii) perform a comparative user study between both visual notations, and (iv) present our open source tool UnSHACLed incorporating our efforts. Users were presented RDF constraints in both visual notations and had to answer questions based on visualization task taxonomies. Although no statistical significant difference in mean error rates was observed, all study participants preferred ShapeVOWL in a self assessment to answer RDF constraint-related questions. Furthermore, ShapeVOWL adheres to more cognitive effective design principles according to our performed comparison. Study participants argued that the increased visual features of ShapeVOWL made it easier to spot constraints, but a list of constraints – as in ShapeUML – is easier to read. However, also that more deviations from the strict UML specification and introduction of more visual features can improve ShapeUML. From these findings we conclude that ShapeVOWL has a higher potential to represent RDF constraints more effective compared to ShapeUML. But also that the clear and efficient text encoding of ShapeUML can be improved with visual features. A one-size-fits-all approach to RDF constraint visualization and editing will be insufficient. Therefore, to support different audiences and use cases, user interfaces of RDF constraint editors need to support different visual notations. In the future, we plan to incorporate different editing approaches, informed by visualization task taxonomies, and non-linear workflows into UnSHACLed to improve its editing capabilities. Further research can built upon our findings and evaluate a ShapeUML variant with more visual features or investigate a mapping from both visual notations to ShEx constraints.".
publication abstract "Current developments on the Web have been marked by the increased popularity of Linked Data and Web APIs. However, these two technologies remain mostly disjunct in terms of developing solutions and applications in an integrated way. Therefore, we aim to explore the possibilities of facilitating a better integration of Web APIs and Linked Data, thus enabling the harvesting and provisioning of data through applications and services on the Web. In particular, we focus on investigating how resources exposed via Web APIs can be used together with Semantic Web data, as means for enabling a shared use and providing a basis for developing rich applications on top.".
publication abstract "Current developments on the Web have been marked by the increased popularity of Linked Data and Web APIs. However, these two technologies remain mostly disjunct in terms of developing solutions and applications in an integrated way. Therefore, we aim to explore the possibilities of facilitating a better integration of Web APIs and Linked Data, thus enabling the harvesting and provisioning of data through applications and services on the Web. In particular, we focus on investigating how resources exposed via Web APIs can be used together with Semantic Web data, as means for enabling a shared use and providing a basis for developing rich applications on top.".
publication abstract "Proceedings of the Third Workshop on Services and Applications over Linked APIs and Data, co-located with the 12th Extended Semantic Web Conference (ESWC 2015)".
publication abstract "Proceedings of the Fourth Workshop on Services and Applications over Linked APIs and Data, co-located with the 13th Extended Semantic Web Conference (ESWC 2016)".
publication abstract "Proceedings of the Second Workshop on Semantic Web Technologies for the Internet of Things, co-located with 16th International Semantic Web Conference (ISWC 2017)".
publication abstract "Proceedings of the Second Workshop on Semantic Web Technologies for the Internet of Things, co-located with 16th International Semantic Web Conference (ISWC 2017)".
publication abstract "To date, there are almost no tools that support the elaboration and research of project ideas in media preproduction. The typical tools that are being used are merely a browser and a simple text editor. Therefore, it is our goal to improve this pre-production process by structuring the multimedia and accompanying annotations found by the creator, by providing functionality that makes it easier to find appropriate multimedia in a more efficient way, and by providing the possibility to work together. To achieve these goals, intelligent multimedia mind maps are introduced. These mind maps offer the possibility to structure your multimedia information and accompanying annotations by creating relations between the multimedia. By automatic connecting to external sources, the user can rapidly search different information sources without visiting them one by one. Furthermore, the content that is added to the mind map is analyzed and enriched; these enrichments are then used to give the user extra recommendations based on the content of the current mind map. Subsequently, an architecture for these needs has been designed and implemented as an architectural concept. Finally, this architectural concept is evaluated positively by several people that are active in the media production industry.".
publication abstract "This paper describes best-practices in lifting an image metadata standard to the Semantic Web. We provide guidelines on how an XML-based metadata format can be converted into an OWL ontology. Additionally, we discuss how this ontology can be mapped upon the W3C’s Media Ontology. This ontology is a standardization effort of the W3C to provide a core vocabulary for multimedia annotations. The approach presented here can be applied to other XML-based metadata standards.".
JIPS.2011.7.1.199 abstract "This paper describes best-practices in lifting an image metadata standard to the Semantic Web. We provide guidelines on how an XML-based metadata format can be converted into an OWL ontology. Additionally, we discuss how this ontology can be mapped upon the W3C’s Media Ontology. This ontology is a standardization effort of the W3C to provide a core vocabulary for multimedia annotations. The approach presented here can be applied to other XML-based metadata standards.".
publication abstract "SPARQL endpoints suffer from low availability, and require to buy and configure complex servers to host them. With the advent of Linked Data Fragments, and more specifically Triple Pattern Fragments (TPFs), we can now perform complex queries on low-cost servers. Online file repositories and cloud hosting services, such as GitHub, Google Code, Google App Engine or Dropbox can be exploited to host this type of linked data for free. For this purpose we have developed two different proof-of-concept tools that can be used to publish TPFs on GitHub and Google App Engine. A generic TPF client can then be used to perform SPARQL queries on the freely hosted TPF servers.".
publication abstract "The configuration of smart homes represents a difficult task for end-users. We propose a goal-driven approach to this challenge, where users express their needs using a graphical configuration environment. Our system then uses semantic descriptions of devices in the user’s surroundings to derive a plan to reach the desired situation. We are able to satisfy complex demands using only first-order logic, which makes this system flexible yet fast. The focus of this paper is to demonstrate how to achieve high usability of the proposed system without burdening users with the underlying semantic technologies. Our initial demo supports setting the ambient temperature, alarms, and media playback, but the use of semantics allows to extend the system with many different kinds of services in a decentralized way.".
publication abstract "In practical terms, the implementation of service recuperation mechanisms based on semantics has been limited due to the expensive procedure comprised in the formal specification of services. This procedure comprises a time-consuming task of semantic annotation, which is carried out by hand by service developers, who also must know models for the semantic description of this type of resources (e.g. OWL-S, WSMO, SAWSDL). To overcome this limitation, a proposal was introduced for the annotation of web services, based on the processing of its available documentation in order to extract the information related to the capabilities offered by the services. By discovering the hidden semantic structure of said information by means of statistical analysis techniques, the proposed mechanism is capable of associating relevant annotations with the operations/resources of the services, as well as grouping them in non-exclusive semantic categories.".
publication abstract "Social networks play an increasingly important role to provide fresh media items related to daily life moments or live coverage of events. One of the problems is that media are spread over multiple social networks. In this paper, we propose a social-network-agnostic approach for collecting recent images and videos which can be potentially attached to an event. The media can be later on processed for the automatic generation of visual summaries in the form of media galleries. Our approach includes the alignment of the varying search result formats that these social networks return while putting the media items in correspondence with the status updates and stories they are related to. More precisely we leverage on: (i) visual features from media items, (ii) Textual features from status updates, and (iii) social features from social networks to interpret, deduplicate, cluster, and visualize media items. We address the technical details of media item extraction and media item processing, discuss criteria for media item filtering and envision several visualization options for media presentation. We have developed a user interface publicly available at http://eventmedia.eurecom.fr/media-finder, which implements the framework detailed in this paper. The evaluation is divided in two parts: first we assessed the performances of the image process deduplication and then we propose a human evaluation of the summary creation compared with Teleportd and Twitter media galleries.".
publication abstract "Open data initiatives have created a revolution in the route planning ecosystem for the public transport sector. The creation of a large amount of route planning services like Google Maps, CityMapper or Navitia, has only been possible thanks to the availability of public transport data as open data. Ever since the disclosure of the London public transport data sources as open data, more public transport companies are following their lead around the world. The benefits obtained by disclosing public transport datasets as open data are diverse and influence the different actors present in the route planning ecosystem: public transport organisations in the role of data publishers for instance may increase their revenue streams as new and better information channels attract more travellers. Also, new analysis and improvements to their operations become possible through feedback received from data reusers on areas where they do not collect data by themselves (e.g. crowdsourced data).".
publication abstract "Using Linked Data based approaches, public transport companies are able to share their time tables and its updates in an affordable way while allowing user agents to perform multimodal route planning algorithms. Providing time table updates, usually published as data streams, means that data is being constantly modified and if there is a large analytical query, its response might be affected due to the changing data. In this demo we introduce a mechanism to tackle this problem by guaranteeing that a user agent will always receive version based responses, therefore ensuring data consistency. Such mechanism also enables access to historical data that could be used for deep analysis of transport systems. However, how this data shall be archived, in order to keep this approach scalable and inexpensive is still a matter of study. In a demonstrator, we published and query data from the Belgium national train system (SNCB) and Madrid Regional Transport Consortium (CRTM). This paper represents the first step towards establishing an affordable framework to publish reliable transport data.".
publication abstract "Open data became a fundamental requirement for the route planning application ecosystem in the public transport sector. The way this data is published has a direct influence in the architectural design of route planning applications, which today tends towards centralized solutions (e.g. Google Maps, CityMapper). Linked Connections emerged as a more scalable and cost-efficient decentralized alternative for public transport open data publishing, compared to centralized approaches. However, how this impacts the different actors that belong to the route planning ecosystem is still unknown. In this work, we study and discuss the potential impact of Linked Connections through a set of evaluations that measure the technical and user-perceived performance of route planning applications that run on the client-side, compared to a traditional approach performing the route planning logic on the server-side. Results showed that (i) for some use cases, a Linked Connections based approach outperforms centralized approaches but it heavily depends on the underlying hardware. (ii) More than half of the travellers that participated on the tests preferred the Linked Connections client application due to additional features such as offline querying and privacy safeguarding regardless of the slower performance in some cases. This work provided insights and an initial assessment of the potential effects of implementing a decentralized open data publishing strategy on the public transport route planning ecosystem. The potential benefits of such an approach are aligned with the ideals of open data of fostering innovation, boosting economic growth and providing solutions for more specific necessities.".
publication abstract "Route planning is key in application domains such as delivery services, tourism advice and ride sharing. Today’s route planning as a service solutions do not cover all requirements of each use case, forcing application developers to build their own self-hosted route planners. This quickly becomes expensive to develop and maintain, especially when it requires integrating data from different sources. We demo a configurable route planner that takes advantage of strategically designed data publishing approaches and performs data integration and query execution on the client. For this demonstrator, we (i) publish a Linked Connections interface for the public transit data in Helsinki, including live updates; (ii) integrate Routable Tiles, a tiled Linked Data version of OpenStreetMap road network and (ii) implement a graphical user interface, on top of the Planner.js SDK we have built, to display the query results. By moving the data integration to the client, we provide higher flexibility for application developers to customize their solutions according to their needs. While the querying might be slow today, these preliminary results already hint at different data publishing strategies that may increase query evaluation performance on the client-side.".
publication abstract "Cycling as a means of urban transportation is positively correlated with cleaner, healthier and happier cities. One way to attract more cyclists, is to provide more infrastructure, such as secure parking facilities. However, authoritative information about parking facilities is heavily decentralized and heterogeneous, which makes secure parking facilities harder to discover by cyclists. Can an open dataset about bike parkings be managed decentrally? In this paper, we present the results of the Velopark project, carried out in Belgium by different actors that include local public authorities, public transport operators and pro-cycling organizations. During the project execution we (i) introduced the Open Velopark Vocabulary as a common semantic data model; and (ii) implemented the Velopark platform, an open data publishing environment for both static and live authoritative parking data. So far, 1599 parking facilities were published through the Velopark platform, 31 different Belgian municipalities and 4 parking related organizations use the platform to describe, publish and manage their parking facilities. A common data publishing environment supports organizations without a data infrastructure to provide access to their information, while following a common data model that guarantees reliable information for cyclists. In future work we will further extend our data model to cover other kinds of infrastructure and bicycle-related services.".
publication abstract "Publishing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic data further increases the computational burden for data publishers, resulting in often unavailable historical data and live schedule updates for most public transport networks. In this paper we apply and extend the current Linked Connections approach for static data to also support cost-efficient live and historical public transport data publishing on the Web. Our contributions include (i) a reference specification and system architecture to support cost-efficient publishing of dynamic public transport schedules and historical data; (ii) empirical evaluations on route planning query performance based on data fragmentation size, publishing costs and a comparison with a traditional route planning engine such as OpenTripPlanner; (iii) an analysis of potential correlations of query performance with particular public transport network characteristics such as size, average degree, density, clustering coefficient and average connection duration. Results confirm that fragmentation size influences route planning query performance and converges on an optimal fragment size per network. Size (stops), density and connection duration also show correlation with route planning query performance. Our approach proves to be more cost-efficient and in some cases outperforms OpenTripPlanner when supporting the earliest arrival time route planning use case. Moreover, the cost of publishing live and historical schedules remains in the same order of magnitude for server-side resources compared to publishing planned schedules only. Yet, further optimizations are needed for larger networks (> 1000 stops) to be useful in practice. Additional dataset fragmentation strategies (e.g. geospatial) may be studied for designing more scalable and performant Web API s that adapt to particular use cases, not only limited to the public transport domain.".
publication abstract "Joint Proceedings of the 2nd RDF Stream Processing (RSP 2017) and the Querying the Web of Data (QuWeDa 2017) Workshops co-located with 14th ESWC 2017 (ESWC 2017)".
publication abstract "When a Solid application writes data to a storage, it does not yet know all possible use cases for which this data will be used. With cross-app interoperability in mind, choosing an application profile and write-structure becomes guess-work: some apps will only work partially with more shallow semantics, some will work slower without a querying API such as a SPARQL endpoint, or some will just not work at all. This requires studying negotiation processes between what apps need to be able to read and how apps write. In this paper, we position our opinion that Solid apps must always write using the richest semantics available to them, which must be written to an interface that is tailored to handle replication and synchronization of data across services. We show how to store data as an Event Source on Solid using the Linked Data Event Streams specification applied to the use-case of storing personal location history. The history of the data is preserved by design, as an Event Source stores all the changes in the data over time. For reasoning of backwards compatibility, we also illustrate that for both reading and writing to the Event Source the complexities can be abstracted away towards a symmetric read/write interface following the current Solid specification.".
publication abstract "In this paper, we present the first results of our ongoing early-stage research on a realtime Wikipedia-based, language-agnostic disaster detection and monitoring tool that leverages user-generated multimedia content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on it and strive to publish detected disasters and accompanying multimedia content following the Linked Data principles to facilitate its wide consumption, redistribution, and evaluation of its usefulness.".
publication abstract "In this paper, we report on the task of near-duplicate photo detection in the context of events that get shared on multiple social networks. When people attend events, they more and more share event-related photos publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content so that it can be used to summarize events visually in the form of media galleries or slideshows. Therefore, methods for the deduplication of near-duplicate photos of the same event are required in order to ensure the diversity of the generated media galleries or slideshows. First, we introduce the social-network-specific reasons and challenges that cause near-duplicate photos. Second, we introduce an algorithm for the task of deduplicating near-duplicate photos stemming from social networks. Finally, we evaluate the algorithm’s results and shortcomings.".
publication abstract "Considerable efforts have been put into making video content on the Web more accessible, searchable, and navigable by research on both textual and visual analysis of the actual video content and the accompanying metadata. Nevertheless, most of the time, videos are opaque objects in websites. With Web browsers gaining more support for the HTML5 <video> element, videos are becoming first class citizens on the Web. In this paper we show how events can be detected on-the-fly through crowdsourcing (i) textual, (ii) visual, and (iii) behavioral analysis in YouTube videos, at scale. The main contribution of this paper is a generic crowdsourcing framework for automatic and scalable semantic annotations of HTML5 videos. Eventually, we present our preliminary results using traditional server-based approaches to video event detection as a baseline.".
publication abstract "The chorus of the popular song TiK ToK by the artist Ke$ha goes “Don’t stop, make it pop. DJ, blow my speakers up. Tonight, I’mma fight. ’Til we see the sunlight. Tik tok on the clock. But the party don’t stop, no”. We all know, however, that each nightlife event, be it a party, concert, or bar evening, comes to an end eventually. With NiteOutMag, we present a Chrome Web application that can help people revive nightlife events in the recent past. Among the younger generation, nightlife activities—just like any other activity—together with related multimedia data get shared online on social networks. The problem is that for one and the same event, the event-related user-generated data may be shared on a plethora of social networks. Therefore, with this paper, we introduce an application that extracts, reconciles, and models events from several event databases or calendars, social data from multiple social networks, and media from some photo and video sharing platforms. The collected data is attached to events held in a given area and further processed to generate an event-centric magazine where each page represents an event illustrated by media items.".
publication abstract "Social networking sites such as Facebook or Twitter let their users create microposts directed to all, or a subset of their contacts. Users can respond to microposts, or in addition to that, also click a Like or ReTweet button to show their appreciation for a certain micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such microposts can, for example, be achieved via Natural Language Processing (NLP) and named entity disambiguation. Therefore, we have implemented a mash-up NLP API, which is based on a combination of several third party NLP APIs in order to retrieve more accurate results in the sense of emergence. In consequence, our API uses third party APIs opaquely in the background to deliver its output. In this paper, we describe how one can keep track of data provenance and credit back the contributions of each single API to the joint result of the combined mash-up API. Therefore, we use the HTTP Vocabulary in RDF and the Provenance Vocabulary. In addition to that, we show how provenance metadata can help understand the way a combined result is formed, and optimize the result formation process.".
publication abstract "In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. Entities covered by the Knowledge Graph include landmarks, celebrities, cities, sports teams, buildings, movies, celestial objects, works of art, and more. The graph enhances Google search in three main ways: by disambiguation of search queries, by search log-based summarization of key facts, and by explorative search suggestions. With this paper, we suggest a fourth way of enhancing Web search: through the addition of realtime coverage of what people say about real-world entities on social networks. We report on a browser extension that seamlessly adds relevant microposts from the social networking sites Google+, Facebook, and Twitter in form of a panel to Knowledge Graph entities. In a true Linked Data fashion, we interlink detected concepts in microposts with Freebase entities, and evaluate our approach for both relevancy and usefulness. The extension is freely available, we invite the reader to reconstruct the examples of this paper to see how realtime opinions may have changed since time of writing.".
publication abstract "Video has become a first class citizen on the Web with broad support in all common Web browsers. Where with structured mark-up on webpages we have made the vision of the Web of Data a reality, in this paper, we propose a new vision that we name the Web(VTT) of Data, alongside with concrete steps to realize this vision. It is based on the evolving standards WebVTT for adding timed text tracks to videos and JSON-LD, a JSON-based format to serialize Linked Data. Just like the Web of Data that is based on the relationships among structured data, the Web(VTT) of Data is based on relationships among videos based on WebVTT files, which we use as Web-native spatiotemporal Linked Data containers with JSON-LD payloads. In a first step, we provide necessary background information on the technologies we use. In a second step, we perform a large-scale analysis of the 148 terabyte size Common Crawl corpus in order to get a better understanding of the status quo of Web video deployment and address the challenge of integrating the detected videos in the Common Crawl corpus into the Web(VTT) of Data. In a third step, we open-source an online video annotation creation and consumption tool, targeted at videos not contained in the Common Crawl corpus and for integrating future video creations, allowing for weaving the Web(VTT) of Data tighter, video by video.".
publication abstract "In this position paper, we describe how analogue recording artifacts stemming from digitalized VHS tapes such as grainy noises, ghosting, or synchronization issues can be identified at Web-scale via crowdsourcing in order to identify adult content digitalized by amateurs.".
publication abstract "Video shot detection is the processor-intensive task of splitting a video into continuous shots, with hard or soft cuts as the boundaries. In this paper, we present a client-side on-the-fly approach to this challenge based on modern HTML5-enabled Web APIs. We show how video shot detection can be seamlessly embedded into video platforms like YouTube using browser extensions. Once a video has been split into shots, shot-based video navigation gets enabled and more fine-grained playing statistics can be created.".
publication abstract "Linked Data interfaces exist in many flavours, as evidenced by subject pages, SPARQL endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lookup of data within ordinal ranges. To make retrieval by such ranges generic and cost-efficient, we propose a REST solution in-between looking up data within ordinal ranges entirely on the server, or entirely on the client. To this end, we introduce a method for extending any Linked Data interface with an n-dimensional interface-level index such that n-dimensional ordinal data can be selected using n-dimensional ranges. We formally define Range Gates and Range Fragments and theoretically evaluate the cost-efficiency of hosting such an interface. By adding a multidimensional index to a Linked Data interface for multidimensional ordinal data, we found that we can get benefits from both worlds: the expressivity of the server raises, yet remains more cost-efficient than an interface providing the full functionality on the server-side. Furthermore, the client now shares in the effort to filter the data. This makes query processing becomes more flexible to the end-user, because the query plan can be altered by the engine. In future work we hope to apply Range Gates and Range Fragments to real-world interfaces to give quicker access to data within ordinal ranges.".
publication abstract "The world contains a large amount of sensors that produce new data at a high frequency. It is currently very hard to find public services that expose these measurements as dynamic Linked Data. We investigate how sensor data can be published continuously on the Web at a low cost. This paper describes how the publication of various sensor data sources can be done by continuously mapping raw sensor data to RDF and inserting it into a live, low-cost server. This makes it possible for clients to continuously evaluate dynamic queries using public sensor data. For our demonstration, we will illustrate how this pipeline works for the publication of temperature and humidity data originating from a microcontroller, and how it can be queried.".
publication abstract "Existing solutions to query dynamic Linked Data sources extend the SPARQL language, and require continuous server processing for each query. Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost. Extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over real-time Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we discuss a framework on top of TPF that allows clients to execute SPARQL queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity. The trade-off is an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating real-time queries over Linked Data to the clients and thus increasing the bandwidth usage, the cost of server-side interfaces is significantly reduced. Our results show that this solution makes real-time querying more scalable in terms of CPU usage for a large amount of concurrent clients when compared to the alternatives.".
publication abstract "Linked Datasets typically evolve over time because triples can be removed from or added to datasets, which results in different dataset versions. While most attention is typically given to the latest dataset version, a lot of useful information is still present in previous versions and its historical evolution. In order to make this historical information queryable at Web scale, a low-cost interface is required that provides access to different dataset versions. In this paper, we add a versioning feature to the existing Triple Pattern Fragments interface for queries at, between and for versions, with an accompanying vocabulary for describing the results, metadata and hypermedia controls. This interface feature is an important step into the direction of making versioned datasets queryable on the Web, with a low publication cost and effort.".
publication abstract "In order to exploit the value of historical information in Linked Datasets, we need to be able to store and query different versions of such datasets efficiently. The Mighty Storage Challenge is being organized to discover the efficiency of such Linked Data stores and to detect their bottlenecks. One task in this challenge focusses on the storage and querying of versioned datasets, in which we aim to participate by combining the OSTRICH triple store and the Comunica SPARQL engine. In this article, we briefly introduce our system as an entry for the versioning task of this challenge. We present preliminary results that show that our system achieves fast query times for the supported queries, other queries are not supported by Comunica at the time of writing. These results of this challenge will serve as a guideline for further improvements to our system.".
publication abstract "RDF Stream Processing (RSP) is a rapidly evolving area of research that focuses on extensions of the Semantic Web in order to model and process Web data streams. While state-of-the-art approaches concentrate on server-side processing of RDF streams, we investigate the TPF-QS method for server-side publishing of RDF streams, which moves the workload of continuous querying to clients. We formalize TPF-QS in terms of the RSP-QL reference model in order to formally compare it with existing RSP query languages. We experimentally validate that, compared to the state of the art, the server load of TPF-QS scales better with increasing numbers of concurrent clients in case of simple queries, at the cost of increased bandwidth consumption. This shows that TPF-QS is an important first step towards a viable solution for Web-scale publication and continuous processing of RDF streams.".
publication abstract "The domain of RDF versioning concerns itself with the storage of different versions of Linked Datasets. The ability of querying over these versions is an active area of research, and allows for basic insights to be discovered, such as tracking the evolution of certain things in datasets. Querying can however only get you so far. In order to derive logical consequences from existing knowledge, we need to be able to reason over this data, such as ontology-based inferencing. In order to achieve this, we explore fundamental concepts on semantic querying of versioned datasets using ontological knowledge. In this work, we present these concepts as a semantic extension of the existing RDF versioning concepts that focus on syntactical versioning. We remain general and assume that versions do not necessarily follow a purely linear temporal relation. This work lays a foundation for reasoning over RDF versions from a querying perspective, using which RDF versioning storage, query and reasoning systems can be designed.".
publication abstract "Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a bidirectional delta chain instead of the default unidirectional delta chain. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions.".
publication abstract "In recent years, research in information diffusion in social media has attracted a lot of attention, since the produced data is fast, massive and viral. Additionally, the provenance of such data is equally important because it helps to judge the relevance and trust-worthiness of the information enclosed in the data. However, social media currently provide insufficient mechanisms for provenance, while models of information diffusion use their own concepts and notations, targeted to specific use cases. In this paper, we propose a model for information diffusion and provenance, based on the W3C PROV Data Model. The advantage is that PROV is a Web-native and interoperable format that allows easy publication of provenance data, and minimizes the integration effort among different systems making use of PROV.".
publication abstract "Containers – lightweight, stand-alone software executables – are everywhere. Industries exploit container managers to orchestrate complex cloud infrastructures and researchers in academia use them to foster reproducibility of computational experiments. Among existing solutions, Docker is the de facto standard in the container industry. In this paper, we advocate the value of applying the Linked Data paradigm to the container ecosystem’s building scripts, as it will allow adding additional knowledge, ease decentralized references, and foster interoperability. In particular we defined a vocabulary Dockeronto that allows to semantically annotate Dockerfiles.".
publication abstract "Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop, co-located with the 13th Extended Semantic Web Conference ESWC 2016".
publication abstract "This paper introduces a Linked Data application for automatically generating a story between two concepts in the Web of Data, based on formally described links. A path between two concepts is obtained by querying multiple linked open datasets; the path is then enriched with multimedia presentation material for each node in order to obtain a full multimedia presentation of the found path.".
publication abstract "Linked Data resources change rapidly over time, making a valid consistent state difficult. As a solution, the Memento framework offers content negotiation in the datetime dimension. However, due to a lack of formally described versioning, every server needs a costly custom implementation. In this poster paper, we exploit published provenance of Linked Data resources to implement a generic Memento services. Based on the W3C PROV standard, we propose a loosely coupled architecture that offers a Memento interface to any Linked Data service publishing provenance.".
publication abstract "Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each of which comes with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. To increase a client’s efficiency, we need to lower the number of requests, and one of the means for this is the incorporation of additional metadata in responses. We analyzed typical SPARQL query evaluations against Triple Pattern Fragments, and noted that a significant portion of requests consists of membership subqueries, which check the presence of a specific triple rather than a variable pattern. In this paper, we therefore study the impact of adding approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing http requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries a WatDiv benchmark test set could be executed with up to a third fewer http requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower generation time and transfer time. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface.".
publication abstract "Read/Write infrastructures are often predicted to be the next big challenge for Linked Data. In the domains of Open Data and cultural heritage, this is already an urgent need. They require the exchange of partial graphs, personalised views on data and a need for trust. A strong versioning model supported by provenance is therefore crucial. However, current triple stores handle storage rather naïvely and don not seem up for the challenge. In this paper, we introduce R&Wbase, a new approach build on the principles of distributed version control. Triples are stored in a quad-store as consecutive deltas, reducing the amount of stored triples drastically. We demonstrate an efficient technique for storing different deltas in a single graph, allowing simple resolving of different versions and separate access. Furthermore, provenance tracking is included at operation level, since each commit, storing a delta and its metadata, is described directly as provenance. The use of branching is supported, providing flexible custom views on the data. Finally, we provide a straightforward way for querying different versions through SPARQL, by using virtual graphs.".
publication abstract "MaaS is able to calculate the least expensive route over multiple mobility providers as billing information has already been standardized. However, this does not include third-party payment schemes where a third party, such as a local government, compensates a part of a travellers’ trip cost when certain criteria are met. To automatize third-party agreements for MaaS, we propose (i) a specification to set up a third-party payment system specifying, among others, how multimodal criteria and trips can be semantically described, and (ii) an open source validator tool returning the compensation for a trip. In future work, we are investigating how personal data can be integrated using Solid data pods.".
publication abstract "In order to reduce the server-side cost of publishing queryable Linked Data, Triple Pattern Fragments (TPF) were introduced as a simple interface to RDF triples. They allow for SPARQL query execution at low server cost, by partially shifting the load from servers to clients. The previously proposed query execution algorithm uses more http requests than necessary, and only makes partial use of the available metadata. In this paper, we propose a new query execution algorithm for a client communicating with a TPF server. In contrast to a greedy solution, we maintain an overview of the entire query to find the optimal steps for solving a given query. We show multiple cases in which our algorithm reaches solutions with far fewer http requests, without significantly increasing the cost in other cases. This improves the efficiency of common SPARQL queries against TPF interfaces, augmenting their viability compared to the more powerful, but more costly, SPARQL interface.".
publication abstract "Recently, Triple Pattern Fragments (TPFs) were introduced as an alternative to reduce server load when high numbers of clients need to evaluate SPARQL queries. This is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPF interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding literal substring matching to the interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare performance of SPARQL queries on multiple implementations with existing solutions, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering these additions on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries.".
publication abstract "The early-to-mid 2000s economic downturn in the US and Europe forced Digital Humanities projects to adopt a more pragmatic stance towards metadata creation and to deliver short-term results towards grant providers. It is precisely in this context that the concept of Linked and Open Data (LOD) has gained momentum. In this tutorial, we want to focus on metadata cleaning and reconciliation, two elementary steps to bring cultural heritage collections into the Linked Data cloud. After an initial cleaning process, involving for example the detection of duplicates and the unifying of encoding formats, metadata are reconciled by mapping a domain specific and/or local vocabulary to another (more commonly used) vocabulary that is already a part of the Semantic Web. We believe that the integration of heterogeneous collections can be managed by using subject vocabularies for cross linking between collections, since major classifications and thesauri (e.g., LCSH, DDC, RAMEAU, etc.) have been made available following Linked Data Principles.".
publication abstract "Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited resources. Readers will learn how to critically assess and use (semi-)automated methods of managing metadata through hands-on exercises within the book and on the accompanying website. Each chapter is built around a case study from institutions around the world, demonstrating how freely available tools are being successfully used in different metadata contexts. This handbook delivers the necessary conceptual and practical understanding to empower practitioners to make the right decisions when making their organisations resources accessible on the Web.".
publication abstract "Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data: 1. remove duplicate records; 2. separate multiple values contained in the same field; 3.aAnalyse the distribution of values throughout a data set; 4. group together different representations of the same reality. These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse Museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.".
publication abstract "The early-to-mid 2000s economic downturn in the US and Europe forced cultural heritage institutions to adopt a more pragmatic stance towards metadata creation and to deliver short-term results towards grant providers. It is precisely in this context that the concept of Linked and Open Data (LOD) has gained momentum. In this paper, we want to focus on reconciliation, the process in which we map domain specific vocabulary to another (often more commonly used) vocabulary that is part of the Semantic Web in order to annex the metadata to the Linked Data Cloud. We believe that the integration of heterogeneous collections can be managed by using subject vocabulary for cross linking between collections, since major classifications and thesauri (e.g. LCSH, DDC, RAMEAU, etc.) have been made available following Linked Data Principles. Re-using these established terms for indexing cultural heritage resources represents a big potential of Linked Data for libraries, archives and museums (LAM), but the application of LOD publishing still requires expert knowledge of Semantic Web technologies. Therefore, this paper aims to examine the feasibility of using subject vocabularies as linking hub to the Semantic Web in advance of such effort. Namely, we will examine and answer the two following questions: 1) what are currently the possibilities to reconcile metadata with controlled vocabularies in a completely automated manner with the help of non-expert tools, and 2) what are the characteristics of the reconciled metadata, and more specifically, do they offer a sufficient discriminatory value for search and retrieval? To provide an answer to these two questions, the paper gives a pragmatic overview of how free-text keywords from the Powerhouse museum (Sydney) can be successfully reconciled with the LCSH (Library of Congress Subject Headings) with the help of Google Refine. The different steps towards reconciliation are performed through freely available metadata and tools, making the process repeatable and understandable for collection holders. All the necessary tools, data and documentation will be made available on the project website FreeYourMetadata.org.".
publication abstract "In this paper, we introduce a Media Decision Taking Engine (MDTE), enabling the automatic selection and/or rating of multimedia content versions, based on the available context information. The presented approach is fully semantic-driven, which means that we not only semantically model the context information, but also the decision algorithms themselves, which are represented in N3Logic, a rule language that extends RDF. The decision rules are based on a rating function, supporting the specification of weights and affinity parameters for each environment property. Finally, we show how the MDTE is integrated in a media delivery platform, using the provisions of the existing Web infrastructure.".
s11042-012-1032-1 abstract "In this paper, we introduce a Media Decision Taking Engine (MDTE), enabling the automatic selection and/or rating of multimedia content versions, based on the available context information. The presented approach is fully semantic-driven, which means that we not only semantically model the context information, but also the decision algorithms themselves, which are represented in N3Logic, a rule language that extends RDF. The decision rules are based on a rating function, supporting the specification of weights and affinity parameters for each environment property. Finally, we show how the MDTE is integrated in a media delivery platform, using the provisions of the existing Web infrastructure.".
publication abstract "Autonomous services discovery, composition and execution is an important problem in the Machine-to-Machine field. Achieving this objective requires addressing several issues: a) how to describe in a machine-understandable format which operations and functionalities an object is able to perform; b) how to represent the interfaces in unambiguous way and allow two or more machines to understand the data exchanged with each other; c) how to make a machine able to aggregate services in order to execute a specific task. Narrowing the domain just to REST APIs, we propose to semantically describe APIs (exposed by objects or web servers) using RESTdesc descriptions and to use JSON-LD as data exchange format. In order to illustrate the straightforward services composition and invocation process, we have implemented a smart client able to generate and execute plans (sequences of HTTP requests) that satisfy the set of operations which should be done for ensuring ideal environmental conditions to plants in a garden.".
publication abstract "The generation of metadata, necessary to retrieve multimedia items conveniently, requires a large amount of manual work. Several processing algorithms that automate parts of this task exist, but they lack a global vision on the object under annotation. In this paper, we investigate how we can apply Semantic Web knowledge to integrate and enhance current processing algorithms in order to answer more advanced metadata queries. We propose a generic problem-solving platform that uses Web services and various knowledge sources to find solutions to complex requests. The platform employs a reasoner-based composition algorithm, generating an execution plan that combines several algorithms as services. It then supervises the execution of this plan, intervening in case of errors or unexpected behavior. We illustrate our approach by a use case in which we annotate the names of people depicted in a photograph.".
publication abstract "Many have left their footprints on the field of semantic RESTful Web service description. Albeit some of the propositions are even W3C Recommendations, none of the proposed standards could gain significant adoption with Web service providers. Some approaches were supposedly too complex and verbose, others were considered not RESTful, and some failed to reach a significant majority of API providers for a combination of the reasons above. While we neither have the silver bullet for universal Web service description, with this paper, we want to suggest a lightweight approach called RESTdesc. It expresses the semantics of Web services by pre- and postconditions in simple N3 rules, and integrates existing standards and conventions such as Link headers, HTTP OPTIONS, and URI templates for discovery and interaction. This approach keeps the complexity to a minimum, yet still enables service descriptions with full semantic expressiveness. A sample implementation on the topic of multimedia Web services verifies the effectiveness of our approach.".
s11042-012-1004-5 abstract "Many have left their footprints on the field of semantic RESTful Web service description. Albeit some of the propositions are even W3C Recommendations, none of the proposed standards could gain significant adoption with Web service providers. Some approaches were supposedly too complex and verbose, others were considered not RESTful, and some failed to reach a significant majority of API providers for a combination of the reasons above. While we neither have the silver bullet for universal Web service description, with this paper, we want to suggest a lightweight approach called RESTdesc. It expresses the semantics of Web services by pre- and postconditions in simple N3 rules, and integrates existing standards and conventions such as Link headers, HTTP OPTIONS, and URI templates for discovery and interaction. This approach keeps the complexity to a minimum, yet still enables service descriptions with full semantic expressiveness. A sample implementation on the topic of multimedia Web services verifies the effectiveness of our approach.".
publication abstract "Many providers offer Web APIs that expose their services to an ever increasing number of mobile and desktop applications. However, all interactions have to be explicitly programmed by humans. Automated composition of those Web APIs could make it considerably easier to integrate different services from different providers. In this paper, we therefore present an automated Web API composition method, based on theorem-proving principles. The method works with existing Semantic Web reasoners at a Web-scale performance. This makes proof-based composition a good choice for Web API integration. We envision this method for use in different fields, such as multimedia service and social service composition.".
publication abstract "Proceedings of the Workshop on Decentralizing the Semantic Web 2017, co-located with 16th International Semantic Web Conference (ISWC 2017)".
publication abstract "Proceedings of the 2nd Workshop on Decentralizing the Semantic Web 2018, co-located with 17th International Semantic Web Conference (ISWC 2018)".
publication abstract "Proceedings of the ESWC Developers Workshop 2015, co-located with the 12th Extended Semantic Web Conference (ESWC 2015)".
publication abstract "Current search technologies can only harness the ever increasing amount of multimedia data when sufficient metadata exists. Several annotations are already available, yet they seldom cover all aspects. The generation of additional metadata proves costly; therefore efficient multimedia retrieval requires automated annotation methods. Current feature extraction algorithms are limited because they do not take context into account. In this article, we indicate how Linked Data can provide information that is vital to create an interpretation context. As a result, advanced interactions between algorithms, information and context will enable more advanced interpretation of multimedia data. Eventually, this will reflect in better search possibilities for the end user.".
publication abstract "Technologically speaking, the Internet is a decentralized network: The infrastructure is spread across the globe and there is no single actor whose sudden termination would bring an end to it. However, that does not mean that all applications on top of the Internet are necessarily decentralized, since many of them depend on infrastructure controlled by only their organization. This chapter aims to raise awareness over the potential advantages that decentralization, as opposed to a fully centralised system, could bring to digital innovations in general. More specifically, we shed light over the current technical, as well as legal (with a focus on data protection and IP law) challenges related to decentralization in order to shed light over its potential repercussions in the context of the industrial internet.".
publication abstract "Without any exaggeration, the Linked Data movement has significantly changed the Semantic Web world. Meanwhile, intelligent services—the other pillar of the initial Semantic Web vision—have not undergone a similar revolution. Although several important steps were taken and significant milestones were reached, we are far from our envisioned destination.".

first
previous
next