Today I’ve mostly been working on the magic of our user data collector for Nucleus, an awesome bit of technology which takes our slightly slow existing method of finding user information and replacing it with one blisteringly fast one based on our ever-favourite database Mongo.
What it does is – on a regular schedule – go through the entire directory letter by letter, collect all the users, and write their details to the database. How it does this, however, is a bit smarter than a bulk import in that it actually looks to see if the user has been updated or not, and records the changes. We can then use this data to do ‘push’ updates of user information – telling services which rely on user data that something has changed as soon as we can, rather than waiting for those services to have to look for changes themselves. We can also let those services do a ‘changes pull’, asking only for those records which have changed since a particular time. All of this combines to reduce network overhead and speed up processing by only sending changed details around, rather than a massive dump of all our data.
Coming soon to Nucleus will also be the first bit of cross-service collation as we begin to include data from students such as addresses and home email addresses. Where in the past this would require querying four different services, receiving a mix of data types and needing a lot of massaging to do anything useful we’ve done the hard bit for you. Even better, instead of giving insecure access to the data by providing direct database access, or blindly dumping the information, access will be controlled using the power of OAuth, giving us fine-grained control over exactly who can see what.