Difference between revisions of "Background information for data providers"

From TV-Browser Wiki
Jump to: navigation, search
(New page: This article contains background information on the topic Providing TV listings with a Primary Data Service. == Get and prepare raw data == The tool '' PDSRunner'' starts the par...)
 
Line 1: Line 1:
 
This article contains background information on the topic [[Providing TV listings]] with a [[Primary Data Service]].
 
This article contains background information on the topic [[Providing TV listings]] with a [[Primary Data Service]].
  
== Get and prepare raw data ==
+
== How the tools work ==
 +
=== Get and prepare raw data ===
 
The tool ''  PDSRunner'' starts the parsers which fetch the program data from the channels and converts the data in a TV-Browser specific format. These data files are stored in the ''raw'' directory.
 
The tool ''  PDSRunner'' starts the parsers which fetch the program data from the channels and converts the data in a TV-Browser specific format. These data files are stored in the ''raw'' directory.
  
== Create diff files ==
+
=== Create diff files ===
 
The tool ''PrimaryDataManager'' looks for differences between the new data in the ''raw'' directory and the already existing data in the ''prepared'' directory. If differences are found the tool creates ''update'' files.
 
The tool ''PrimaryDataManager'' looks for differences between the new data in the ''raw'' directory and the already existing data in the ''prepared'' directory. If differences are found the tool creates ''update'' files.
  
Line 19: Line 20:
 
Each program gets an ID to assign the entry in the "base" file with the correct entries in the "more" and "picture" files. The ID's are also necessary to keep the data consistent in the "update" files.
 
Each program gets an ID to assign the entry in the "base" file with the correct entries in the "more" and "picture" files. The ID's are also necessary to keep the data consistent in the "update" files.
  
=== Update files ===
+
==== Update files ====
 
On the first run "full" files are created. If the ''PrimaryDataManager'' finds changes in the data, it creates "update" files as needed. These update files contain the ID of the program and the new data. The changes will also be written into previous update files and the first "full" version.
 
On the first run "full" files are created. If the ''PrimaryDataManager'' finds changes in the data, it creates "update" files as needed. These update files contain the ID of the program and the new data. The changes will also be written into previous update files and the first "full" version.
  
== Upload the data ==
+
=== Upload the data ===
 
The tool ''MirrorUpdater'' loads the files and some additional information onto the mirrors.
 
The tool ''MirrorUpdater'' loads the files and some additional information onto the mirrors.
 +
 +
== Tips for data providers ==
 +
Since the tools create diff files by using the data in the "prepared" directory, you must never delete this directory. Otherwise the data becomes inconsistent between the "prepared" directory and the files on the mirrors and the files the users already downloaded.
 +
 +
=== If the "prepared" directory has been deleted accidentally ===
 +
If the "prepared" directory has been deleted for some reason, then you might have the following possibilities to keep the damage low.
 +
 +
All of the following hints are a bit '''experimental''' and might not work as expected and might even mess up things more than before. So always '''create backups''' of all data before doing one of the following steps and make yourself aware of what you are doing.
 +
 +
'''1.''' Make a backup of the "prepared" directory, then delete its content.
 +
 +
'''2.a)''' - if there has been '''at most one update''' since the deletion and since then '''no files have been uploaded onto the mirrors''', you can copy the files from the "backup" directory into the "prepared" directory. But make sure that the files in the "backup" directory really are those that existed before the deletion of the "prepared" directory. (The "backup" files should contain more "update" files than the "prepared" files.)
 +
 +
'''2.b)''' - if there has been more than one update (this means also the "backup" directory contains the wrong files), or if you are not sure how many update have been done since the deletion, or if files have been uploaded onto the mirrors since then: Copy all the files from the mirror into the "prepared" directory.
 +
 +
'''3.''' Start the ''PrimaryDataManager''. If it succeeds hopefully everything is fine again. If it fails with messages saying something like "Converting Day program (..) failed" and "Program frame with ID x has no start time" and you did step 2.b), do the following:
 +
 +
:For each channel where the ''PrimaryDataManager'' fails, start the ''PrimaryDataManager'' with the argument "-forceCompleteUpdate" followed by the channels name. The ''PrimaryDataManager'' will then create new ID's for all programs (on all days) of this channel. This will result in a lot of new "update" files but the inconsistency of the data will hopefully be fixed.
 +
 +
'''4.''' Run the ''MirrorUpdater''.

Revision as of 10:41, 1 May 2009

This article contains background information on the topic Providing TV listings with a Primary Data Service.

How the tools work

Get and prepare raw data

The tool PDSRunner starts the parsers which fetch the program data from the channels and converts the data in a TV-Browser specific format. These data files are stored in the raw directory.

Create diff files

The tool PrimaryDataManager looks for differences between the new data in the raw directory and the already existing data in the prepared directory. If differences are found the tool creates update files.

To keep the traffic for data providers low, the data is divided into different files:

  • "base" contains time, title and data as actors etc.
  • "more00-16" contains the descriptions for programs between 0.00 and 16.00
  • "more16-00" contains the descriptions for programs between 16.00 and
  • "picture00-16" contains die Bilder for programs between 0.00 and 16.00
  • "picture16-00" contains die Bilder for programs between 16.00 and 0.00
  • For days with > 255 programs there are also "additional" files

The only alway required file is "base".

Each program gets an ID to assign the entry in the "base" file with the correct entries in the "more" and "picture" files. The ID's are also necessary to keep the data consistent in the "update" files.

Update files

On the first run "full" files are created. If the PrimaryDataManager finds changes in the data, it creates "update" files as needed. These update files contain the ID of the program and the new data. The changes will also be written into previous update files and the first "full" version.

Upload the data

The tool MirrorUpdater loads the files and some additional information onto the mirrors.

Tips for data providers

Since the tools create diff files by using the data in the "prepared" directory, you must never delete this directory. Otherwise the data becomes inconsistent between the "prepared" directory and the files on the mirrors and the files the users already downloaded.

If the "prepared" directory has been deleted accidentally

If the "prepared" directory has been deleted for some reason, then you might have the following possibilities to keep the damage low.

All of the following hints are a bit experimental and might not work as expected and might even mess up things more than before. So always create backups of all data before doing one of the following steps and make yourself aware of what you are doing.

1. Make a backup of the "prepared" directory, then delete its content.

2.a) - if there has been at most one update since the deletion and since then no files have been uploaded onto the mirrors, you can copy the files from the "backup" directory into the "prepared" directory. But make sure that the files in the "backup" directory really are those that existed before the deletion of the "prepared" directory. (The "backup" files should contain more "update" files than the "prepared" files.)

2.b) - if there has been more than one update (this means also the "backup" directory contains the wrong files), or if you are not sure how many update have been done since the deletion, or if files have been uploaded onto the mirrors since then: Copy all the files from the mirror into the "prepared" directory.

3. Start the PrimaryDataManager. If it succeeds hopefully everything is fine again. If it fails with messages saying something like "Converting Day program (..) failed" and "Program frame with ID x has no start time" and you did step 2.b), do the following:

For each channel where the PrimaryDataManager fails, start the PrimaryDataManager with the argument "-forceCompleteUpdate" followed by the channels name. The PrimaryDataManager will then create new ID's for all programs (on all days) of this channel. This will result in a lot of new "update" files but the inconsistency of the data will hopefully be fixed.

4. Run the MirrorUpdater.