APS Updates

Archive for March 2013

A slice of Salami: integrating course and job profile searching

leave a comment »

The Salami Layer

We’ve been developing a prototype of the ‘Salami Layer’ idea first mooted a while back as a result of the University of Nottingham’s Salami project. This is all about linking data together to make useful services for people, and to provide more nodes in a growing network of interoperable data.

Salami focused on labour market information. We’ve been taking it forward in the MUSAPI (MUSKET-SALAMI Pilots) project with a view to producing a hybrid service (or services) that use both the MUSKET text description comparison technology and the SALAMI layer material to link together courses and job profiles.

Salami HTML Demo

Thanks to the skill of our newest member of staff at APS (Jennifer Denton), we now have a demonstrator here: http://81.187.183.45/Salami/salami. It uses recently published XCRI-CAP feeds from The Open University, Courtauld Institute and the University of Leicester as the source of its courses information (noting that these are not necessarily comprehensive feeds). Job Profile information has come from Graduate Prospects, from the National Careers Service and Target Jobs.

The purpose of the demonstrator is to show how we can link together subject concepts that are used to find courses with occupation concepts used to find job profiles. It relies on classifying courses with appropriate terms, in this case JACS3, for the discovery of relevant courses, mapping subject concepts to occupation concepts and then linking in the job profiles. This last task was done by attaching them to the occupation terms (in this case CRCI – Connexions Resource Centre Index – terms), rather than by searching – that will come later. All of these bits were wrapped up in a thesaurus. We then made it all go via a MySQL database, some Java code and a web page. There are some sharp edges still as we haven’t finished cleaning up the thesaurus, but I think it shows the principles.

We haven’t used random keywords, but well known classification systems instead, so that we can develop a discovery service that produces relevant and ranked results (eventually), not just a Go0gle-style million hits listing.

The way the demonstrator works is as follows:

  • Select a term from the drop-down list at the top. This list consists of our thesaurus terms of a mixture of academic subjects for searching for courses and occupation terms for searching job profiles. You can start typing, and it will go to that place on the list. For example try “History of Art”.
  • Then click Select. This will bring up a list of Related Terms (broader, narrower and related terms with respect to your selection), Subject/Occupation Terms (if you’ve picked a subject, it will show related Occupation Terms; if you picked an occupation, it will show related Subject Terms); and Links to Further Information.
salamiDemo1
Salami Demo 1
salamiDemo2
Salami Demo 2
  • You can navigate around the search terms we use by clicking on the Refine button next to the entries in the Related Terms and Subject/Occupation Terms lists. For example, if you click on Refine ‘history by topic’, this changes your focus to the ‘history by topic’ subject, and you can then navigate the subject hierarchy from there. If you click on Refine ‘heritage manager’, this changes your focus to that occupation and you can further navigate around jobs about information services or various subjects.
salamiDemo3
Salami Demo 3
  • At the bottom of the page we have a list of links to further information. These will be either links to relevant courses or to job profiles. The former are drawn from XCRI-CAP feeds, the latter are currently hard-wired into our thesaurus – we’re currently developing a method of using live searches for both types of link. For example, for “heritage manager” we have links to Graduate Prospects and Target Jobs profiles for Heritage Manager.

The upshot of the demonstrator is that we can show how to integrate the discovery of both courses and job profiles (and later on, job opportunities) using a single search term.

Oh-So Thesaurus

The technological underpinning of this is our thesaurus, which has the following broad components.

  • A ‘master’ table of thesaurus terms with attached classifications (in particular JACS3 for subjects and CRCI for job profiles).
  • A table of occupation-subject term links (O>S)
  • A table of subject-occupation term links (S>O)
  • A table of occupation-profile links, currently for implementation of the job profile URLs.

Inclusion of JACS3 codes on the course records and occupation codes on the job profiles is key to the discovery process, so that we can focus on concepts, not string searching. This means, for example, that a search for ‘history of art’ will find courses such as ‘MA in Conservation of Wall Painting’ or ‘MA in Art History’ (Courtauld Institute and Open University respectively), even though neither of the records or web pages for these courses contains the string ‘history of art’.

Perhaps more importantly we can find out that, if we’re interested in the history of art, there are several job areas that might well be relevant, not simply work in museums and galleries, but also heritage manager – and if we browse only one step from there, we can find occupation areas in the whole world of information services, from archaeologist to social researcher, from translator to patent attorney. And all of these possibilities can be discovered without going from this service to any form of separate ‘careers search’ website.

Further extensions

Our Salami demonstrator suggests that this approach could be extensible to other areas. Perhaps we can link in standard information about qualifications, just a short hop from courses. Maybe we can classify competencies or competence frameworks and link these to courses via vocabularies for learning outcomes / competence / curriculum topics.

The other strand in MUSAPI is the textual description comparison work using the MUSKET technology. Even via our Salami demonstrator, your lists are bald undifferentiated lists. If we can capture a range of search concepts from the user – parameters from their current circumstances, past skills, experience, formal and informal education and training, and aspirations – then we could use the MUSKET tools against the Salami results to help to put the results in to some form of rank order. The user would then be able to refine this to produce higher quality results in relation to that individual’s needs, and our slice of salami will have stretched a long way.

Written by benthamfish

March 18, 2013 at 3:38 pm

Consuming XCRI-CAP III: Skills Development Scotland

leave a comment »

Skills Development Scotland has operated a data collection system called PROMT for many years. PROMT is a client application (not browser-based) that sits on your computer and presents you with a series of screens for each course you want to maintain. Each course may have many ‘opportunities’ (these are the same as XCRI-CAP presentations) with different start dates, visibility windows and other characteristics. Many fields in PROMT have specific requirements for content that make the experience of keying not particularly enjoyable (though it has been improved since first launch).

With OU course marketing information consisting of several hundred courses and over 1,000 opportunities, it was with some relief that we at APS (running 3rd party course marketing information dissemination for The OU) turned to the SDS’ Bulk Update facility, using XCRI-CAP 1.1. We had been nervous of using this facility initially, because PROMT data is used not only for the SDS’ course search service, but also has a direct link to a student registration and tracking service for ILAs (Independent Learning Accounts; for non-Scottish readers, ILAs continued in Scotland even though they were discontinued for a while south of the border). Students can get ILA funding only for specific types of course, so each course/opportunity has to be approved by Skills Development Scotland. Changes to the course marketing information can result in ILA approval being automatically rescinded (albeit temporarily), which can mean the provider losing student tracking details, and therefore being at risk of losing the student entirely. So naturally we decided to do some careful testing in conjunction with both SDS and our colleagues at The OU’s Scottish office.

Fortunately we discovered that when we uploaded opportunities the system added them on to existing records, rather than replacing them, so student tracking was unaffected. In addition, individual fields of course records for existing courses was over-written but the records remained active and opportunities were unchanged. These features meant that data integrity was maintained for the opportunity records, and we could always revert to the existing version and delete the new, if necessary.

We were able to load new courses with new opportunities, and also existing courses with new opportunities with no significant problems. The potential ILA difficulty was somewhat reduced, because The OU’s information for an individual opportunity does not need to be updated once it has been approved for ILA; our main reason for updating opportunities themselves was to add in fees information, but cost information has to be present before an opportunity can gain ILA approval, so this type of update would not interrupt ILA approval or student tracking.

Owing to requirements for some proprietary data, for example numerical fees information and separate VAT, not everything could be captured through XCRI-CAP. However, using the PROMT interface for checking the data, adding in very small extras and deleting duplicated opportunities was comparatively light work, as the mass of it was handled by the XCRI-CAP import.

Strikingly good parts of our Bulk Update process (apart from the obvious vast reduction in keying time):

  • Use of a vocabulary for qualification type in PROMT. This made it easy to use various rules to map from The OU data to the required qualification grouping. These rules included a close examination of the content of the qualification title in the XCRI-CAP data to make sure we mapped to the correct values.
  • For some elements, use of standardised boilerplate text in specific circumstances, again identified by business rules.
  • Good reporting back from the SDS Bulk Update system on the status (and errors) from the import. This included an online status report showing how many records of each type had been successfully uploaded, with date and time, after a few minutes from the time of loading.
  • The system permits us to download the whole data set (well, technically as much as could be mapped) in XCRI-CAP 1.1 format, so we were able to compare the whole new set of records with what we expected to have.
  • The ability to review the new data in the PROMT client interface within minutes of the Bulk Upload. This gives a great reassurance that nothing’s gone wrong, and it permits rapid checking and small tweaks if necessary.

I see this combination of bulk upload with a client or web-based edit and review interface as an excellent solution to course marketing information collection. This push method of data synchronisation has the advantage of maintaining the provider’s control of the supply, and it still permits fine-tuning, checking and manual editing if that is necessary. In contrast a fully automatic ‘pull’ version might leave the provider out of the loop – not knowing either whether the data has been updated, or whether any mistakes have been made. This is particularly important in cases where the collector is unfamiliar with the provider’s data.


XCRI-CAP: turn 12 days of keying into 3 hours of checking.

Written by benthamfish

March 6, 2013 at 2:50 pm