APS Updates

Archive for February 2013

Consuming XCRI-CAP II: XCRI eXchange Platform (XXP)

leave a comment »

XXP experiences

Since I helped to specify the XCRI eXchange Platform, and I’m currently seeking more institutions to use it, I do have an interest. However, I don’t do the very techie, database development or systems development work on it, so I’m more a very experienced user and partially designer.


The purpose of XXP is to provide an XCRI-CAP service platform, so it has facilities for loading XCRI-CAP data, though not yet fully automatic ones. The platform has been designed specifically for XCRI-CAP, so its main functions are to provide input and output services that are likely to be relevant to the community. For example, it has CPD and Part Time course data entry facilities, enabling providers to key and maintain these types of course very easily, with vocabularies optimised for the purpose. There is also a CSV loader for those who can output CSV but not XCRI-CAP – this effectively provides a conversion from CSV to XCRI-CAP 1.2, because like all the XXP services, loading in the data enables creation of output XCRI-CAP feeds (both SOAP and RESTful).

Importantly XXP has a feed register (discovered by our Cottage Labs colleagues for their Course Data Programme demonstrator project), so that you can discover where the feed is, who’s responsible for it, what it covers and so on.

XXP is defined by the input and output requirements that APS and Ingenius Solutions have currently provided in response to their perception of market demand. This necessarily changes as more institutions get their data sorted out. While the focus in XXP is on acting as an agent for a provider (a university or college), XXP is effectively an interface between the provider and other aggregating organisations. It enables the creation of ‘value-added’ feeds enhanced by extra data (such as addition of vocabularies, like those for course type, or subject) and by transformation of data (typically concatenating or splitting text fields, or mapping from one classification system or vocabulary to another).

Getting XCRI-CAP data into XXP is at the moment not completely automatic. The main routines are through a manual load – which is fairly time consuming – or through an automatic CSV load (data2XCRI service), requiring a CSV file. In fact (and somewhat bizarrely) it’s not difficult to produce the CSV file from an existing XCRI-CAP file, then load it in. This is a stopgap measure till XXP has a fully functioning XCRI-CAP loader.

My use of the XXP consumption of XCRI-CAP at the moment has been using a push method – I stay in control of the operation and can make sure it all works as expected. XXP has a straightforward read-only View function so you can see the data in the system when loaded. If changes need to be made, then you make them at source (upstream); if there was an edit function for the XXP-loaded data, you would wipe out changes when you next loaded the data in.

As the data content going into XXP is controlled directly by the provider, XXP imports whole data sets, not updates. This simplifies the process considerably on both sides, which can focus entirely on live complete data sets. Maybe this needs a bit more explanation. I figure that if the provider controls the data, then the current data in XXP won’t have been ‘enhanced’ by manual edits or upgraded data. Therefore, it’s safe to completely overwrite all the data for the provider – that won’t wipe out anything useful that we’re not going to add back in. This is in contrast to ‘delta update’ methods that compare old and new data sets and just pump in the changed material. It’s much simpler, which has some merit.

Some of the difficulties that had to be overcome in the XXP aggregation:

  • Use of URLs as internal identifiers (ie inside XXP) for linking courses and presentations – this is overcome either by using a new-minted internal identifier or by re-constructing it (keeping the unique right-hand part).
  • On-the-fly refinements using xsi:type – this is a technical problem as many tools don’t like (read: tolerate) xsi:type constructions, or indeed any type of redefinitions, extensions or restrictions. This requires workarounds for or at least careful handling of extended <description> types.
  • Non-normalised material in XCRI-CAP structures. For example, <venue> is nested in presentations, therefore repeated. As the XCRI-CAP is parsed, you may find new venues or repeated venues that need to be processed. Ideally all venues should be processed prior to course>presentation structures, so it may be best to pass once through the file to discover all the venues, then a second time to populate the rest.
  • Incomplete bits. For example, the venues referred to in the previous bullet may simply have a title and postcode. XXP has a facility for adding missing data to venues, so that the output XCRI-CAP feed can be more complete.
  • Matching of vocabularies. Some feeds may use JACS, others may use LDCS, others simply keywords, and yet all the data goes into a subject field – this requires a method to store the name of classification and version number (JACS 1.7, 2 and 3 are substantially different).

A substantial advantage of XXP is that once you’ve put the data in (in whatever format), you can get it out very easily – currently as XCRI-CAP SOAP and RESTful getCourses, but there’s no reason why other APIs couldn’t be added for JSON, HTML, RDF and so on. This effectively means that XXP can have mapping and transformation services into and out of XCRI-CAP, adding value for particular ‘flavours’ or for new versions.

XCRI-CAP: turn 12 days of keying into 3 hours of checking.


Written by benthamfish

February 25, 2013 at 5:30 pm

Posted in Uncategorized

Consuming XCRI-CAP I

leave a comment »

This post and a few later ones will be some musings on my experiences of how XCRI-CAP is or might be consumed by aggregating organisations and services. I’ll not go into the theoretical models of how it could be done, but I’ll touch on the practicalities from my perspective. Which, I admit, is not as a ‘proper’ technical expert: I don’t write programmes other than the occasional simplistic perl script, neither do I build or manage database systems, other than very simple demonstrators in MS Access, and I dabble in MySQL and SQL Server only through the simplest of front end tools.

My main XCRI-CAP consuming efforts have been with four systems: XXPTrainagain, Skills Development Scotland’s Bulk Import Facility and K-Int’s Course Data Programme XCRI-CAP Aggregator.

XXP characteristics

  • Collaborative working between APS (my company) and Ingenius Solutions in Bristol
  • Service platform for multiple extra services, including provider and feed register (for discovery of feeds), AX-S subject search facility, CSV to XCRI converter, web form data capture, getCourses feed outputs (SOAP and RESTful)
  • Doesn’t yet have an auto-loader for XCRI-CAP. We can load manually or via our CSV to XCRI facility.

Trainagain characteristics

  • Existing system with its own established table structure, its own reference data and own courses data
  • SQL Server technology
  • I have off-line ‘sandbox’ version for playing around with.

Skills Development Scotland Bulk Import Facility characteristics

  • XCRI-CAP 1.1 not 1.2
  • Existing live XCRI-CAP aggregation service (push architecture)
  • Works in conjunction with the PROMT data entry system

K-Int XCRI-CAP Aggregator characteristics

  • Built on existing Open Data Aggregator, a generalised XML consuming service.
  • Takes a ‘relaxed’ view to validation – not well-formed data can be imported.
  • Outputs JSON, XML and HTML. But not XCRI-CAP.

These are early days for data aggregation using XCRI-CAP. There’s been a chicken-and-egg situation for a while. Aggregating organisations won’t readily invest in facilities to consume XCRI-CAP feeds until a large number of feeds exist, while HEIs don’t see the need for a feed if no-one is ready to consume them. The Course Data Programme takes the second one of these (I guess that’s the egg??) problems – if we have 63 XCRI-CAP feeds, then we should have a critical mass to provoke aggregating organisations to consume them.

Some of the questions around consumption of XCRI-CAP feeds centre on technical architecture issues (Push or Pull?), what type of feed to publish (SOAP, RESTful, just a file?), how often should the feed be updated and / or consumed (real-time updating? weekly?, quarterly? annually? Whenever stuff changes?), how do the feed owners know who’s using it? (open access v improper usage, copyright and licencing). Some of these issues are inter-related, and there are other practical issues around consuming feeds for existing services – ensuring that reference data is taken into account, for example.

I’ll try to tease out my impressions of the practicalities of consuming XCRI-CAP in various ways over the next few blog posts.

XCRI-CAP: turn 12 days of keying into 3 hours of checking.

Written by benthamfish

February 21, 2013 at 3:11 pm

In the Fishbowl

leave a comment »

Very interesting tweet chat on Uni of Nottingham’s Fishbowl, lunchtime business chat about the extra bits of work that are done for clients out of a sense of goodwill, professionalism or just not estimating the time and effort required. 

Written by benthamfish

February 14, 2013 at 5:36 pm

Posted in Uncategorized

What’s the point of XCRI-CAP?

leave a comment »

What’s the point of XCRI-CAP? This has been a cry for quite a while, even amongst some of the projects in the JISC funded Course Data Programme. Well, this is a story about how I’ve found it useful.

Many years ago I left ECCTIS 2000, the UK’s largest courses information aggregator and publisher, having been technical lead there for 8 years. Over that period of 8 years, during which we moved our major platform from CD-ROM (remember them?) to the web, we established a state-of-the-art course search system with integrated data picked up from:

  • course marketing information (keyed, classified and QAed by Hobsons Publishing),
  • text files from professional bodies (keyed by them, but marked up by us),
  • advertising copy and images (also keyed by the supplier and marked up by us),
  • subject-based statistics from HESA,
  • vacancy information (at appropriate times of the year) from UCAS,
  • and so on.

We used a new-fangled technology called Standard Generalised Markup Language (SGML) with our own bespoke markup.

The technology allowed us to produce separately versioned searchable products for three flavours of CD-ROM (Scotland, rest of UK, international), the web and for printed publications, all from the same integrated data set. Our system enabled us to aggregate data received from multiple sources, including huge data sets of well-structured text (from Hobsons), quite large statistical sources (HESA), and smaller ‘freestyle’ text items from advertisers and other organisations that we marked up ourselves. Shades of XCRI-CAP Demonstrator projects, but 20 years ago. ECCTIS 2000 was a major aggregator, and probably *the* major UK courses information aggregator of the time. Our development built on some highly innovative work carried out by The Open University in the 1980s, including seminal developments in CD-ROM technology, but that’s another story.

Much of my career to date had been centred on the development of standard methods for managing course marketing information as an aggregator. Quite a bit of my freelance career was to be on the other side of the fence, helping HEIs to manage courses information as providers, though I’ve always had some involvement in the aggregating organisation field.

APS Ltd, my small company, was fortunate enough to gain a contract from The Open University to act as their agent for disseminating course marketing information to the wider world of the emerging course search sites on the web. The main ones from the OU’s viewpoint at that time were the British Council, Graduate Prospects, the learndirect services in the countries of the UK. I also set up, for UCAS, its ‘data collection system’ through which UCAS obtained the courses data not used in its application system, but supplied on to third parties (such as learndirect, newspapers, Hotcourses and others).

Most of these small acts of data collection and dissemination were carried out by what are now seen as ‘traditional’ methods: re-keying from prospectuses, keying directly into a supplier’s web form. However, in a few cases (not nearly enough in my view) we were able to obtain electronic files from HEIs – for example, as I was managing the OU dissemination and the UCAS data collection input, it seemed sensible to me to provide the data electronically and to import it automatically. No problem.

At that point, it occurred to me that if I could do this for the OU data, why not for many other HEIs? One reason was lack of standards, the other main one was the chaos in course marketing systems (where they existed) in HEIs – understandable as most were desperately trying to come to terms with new internet technologies, particularly websites, and how these related to their paper prospectuses.

My initial solution was to use SGML (XML being a twinkle in someone’s eye at that time) to create a ‘lowest common denominator’ structure and format for courses information, convert data into that format, then write a suite of programmes to create bespoke outputs for course information aggregrating organisations. There ensued a ‘happy time’ of 3 to 4 years during which we would acquire the OU data in a convenient database format, carry out a swathe of well-documented software-driven and mainly automatic processes, produce a range of output files (Access databases, spreadsheets, CSV files) and fling them around the country for up to ten or so aggregating organisations to import. For learndirect Scotland, to take just one example, we would produce a series of CSV files, email them off and they would load them into their database. Time taken: maybe 5 minutes for the automatic processing, 30 minutes for checking.

OU Course Data Converter Suite

OU Course Data Converter Suite

I stress here that our supply of OU data to learndirect Scotland before 2007 took us about 35 minutes, 90% of that simply checking the data. We would supply updates five times per year, so our total annual time specifically on the learndirect Scotland update would have been significantly less than half a day. However, in a re-organisation, learndirect Scotland disappeared, and in place of their system that imported the OU data, the replacement organisation implemented a new one called PROMT. Ironically, this new system was anything but, from our perspective. With no import mechanism, we were required to key everything from scratch into their bespoke and somewhat eccentric client software – our task went from 35 minutes to 2 to 3 days (the OU had over 1,200 presentations), and the annual task leapt from less than half a day to about 12 days. A double irony: behind their clunky client software was XML and various other interoperability techniques, completely unavailable to those supplying the data.

This was the situation in 2007, and our ‘happy time’ ended, as everyone rapidly stopped taking bulk updates and moved to the ‘easier’ method of forcing HEIs to re-key their data into bespoke web forms. Our time to update the OU data more than doubled – so much for new technology! There was much grinding of teeth (and not just from APS, but from colleagues across the sector).

By now, you should be able to see where I’m coming from in respect of XCRI-CAP.

So, what’s the point of XCRI-CAP? My final illustration: Skills Development Scotland has now done us proud. In addition to their PROMT software (now somewhat improved), they have set up an excellent bulk import facility for providers to use to supply XCRI-CAP 1.0 or 1.1 data (and I’m sure we can persuade them to use 1.2 soon). APS is now using this facility, coupled with The Open University’s XCRI-CAP 1.1 feed, to get back to our ‘happy time’ again; only better, because now everyone can have ‘happy times’ if more aggregators use XCRI-CAP.

XCRI-CAP: turn 12 days of keying into 3 hours of checking.


APS has also produced a ‘value added’ XCRI-CAP 1.2 feed for any aggregator to use: http://www.alanpaull.co.uk/OpenUniversityXCRI-CAP1-2.xml. As we are able to tweak this feed in response to specific aggregator requirements, please get in contact with alan@alanpaull.co.uk, if you would like to use this feed, or to discuss how APS might help you with your courses information dissemination. We also have a range of services through the XXP Platform.

Written by benthamfish

February 14, 2013 at 9:44 am

Posted in XCRI

Typing woes

leave a comment »

Getting the right courses: slicing and dicing courses in #coursedata #xcri feeds. New blog post at: http://www.xcri.co.uk/h2-mm2-xcriblog.html.

Written by benthamfish

February 4, 2013 at 8:39 pm

Posted in Uncategorized