I have discovered a strange behavior in postgres's crosstab function that I cannot explain, but hope someone else may...
The version of the crosstabs function I'm using requires first building a preliminary table.
This SQL successfully creates the preliminary table:
SELECT
ST.studyabrv||' '||S.labid||' '||S.subjectid||' '||S.box::varchar||' '||S.well AS "rowname",
M.marker AS "bucket",
G.allele1||' '||G.allele2 AS "bucket_value"
INTO TABLE ct
FROM
geno.gmarkers M,
geno.genotypes G,
geno.gsamples S,
geno.guploads U,
geno.gibg_studies ST
WHERE
G.markers_id=M.id
AND G.gsamples_id=S.id
AND S.guploads_id=U.id
AND U.ibg_study_id=ST.id
AND ( M.id=5 OR M.id=6 OR M.id=2 OR M.id=4 OR M.id=3)
AND ( S.labid='CL100001' OR S.labid='CL100002' OR S.labid='CL100003' OR S.labid='CL100004' OR S.labid='CL100005' OR S.labid='CL100006' OR S.labid='CL100007' OR S.labid='CL100008' OR S.labid='CL100009' OR S.labid='CL100010' OR S.labid='CL100011' OR S.labid='CL100012' OR S.labid='CL100013' OR S.labid='CL100014' OR S.labid='CL100015')
ORDER BY box,well;
Which produces output like:
rowname | bucket | bucket_value
--------------------------+-----------+--------------
LTS CL100001 10011 1 A01 | 5HTTLPR-T | S La
LTS CL100001 10011 1 A01 | 5HTTLPR-D | 14 16
LTS CL100001 10011 1 A01 | DAT1 | 440 480
LTS CL100001 10011 1 A01 | DRD4 | 475 475
LTS CL100001 10011 1 A01 | Caspi | 351 351
LTS CL100009 10420 1 A02 | Caspi |
LTS CL100009 10420 1 A02 | 5HTTLPR-T | La Lg
LTS CL100009 10420 1 A02 | 5HTTLPR-D | 16 16
LTS CL100009 10420 1 A02 | DAT1 | 440 480
LTS CL100009 10420 1 A02 | DRD4 | 475 475
...
However, if I attempt to include a date column, which are all null, as in:
SELECT
ST.studyabrv||' '||S.labid||' '||S.subjectid||' '||S.box::varchar||' '||S.well||' '||G.run_date::text AS "rowname",
M.marker AS "bucket",
G.allele1||' '||G.allele2 AS "bucket_value"
INTO TABLE ct
FROM
geno.gmarkers M,
geno.genotypes G,
geno.gsamples S,
geno.guploads U,
geno.gibg_studies ST
WHERE
G.markers_id=M.id
AND G.gsamples_id=S.id
AND S.guploads_id=U.id
AND U.ibg_study_id=ST.id
AND ( M.id=5 OR M.id=6 OR M.id=2 OR M.id=4 OR M.id=3)
AND ( S.labid='CL100001' OR S.labid='CL100002' OR S.labid='CL100003' OR S.labid='CL100004' OR S.labid='CL100005' OR S.labid='CL100006' OR S.labid='CL100007' OR S.labid='CL100008' OR S.labid='CL100009' OR S.labid='CL100010' OR S.labid='CL100011' OR S.labid='CL100012' OR S.labid='CL100013' OR S.labid='CL100014' OR S.labid='CL100015')
ORDER BY box,well;
This produces the output:
rowname | bucket | bucket_value
---------+-----------+--------------
| 5HTTLPR-T | S La
| 5HTTLPR-D | 14 16
| DAT1 | 440 480
| DRD4 | 475 475
| Caspi | 351 351
| Caspi |
| 5HTTLPR-T | La Lg
| 5HTTLPR-D | 16 16
As you can see, adding the run_date column to the end of the "rowname" composite column renders the entire composite blank...which is crazy. If I populate run_date with dummy data, it will show up....but if it is blank or null, this causes the "rowname" to go blank.
I cannot tell if this is a bug in postgres, but it is a bizarre result that I would like to resolve, if possible.
TIA, rixter
You should think of null
as an unknown
value. null
values are not numbers or strings so you can't opererate over them as if they were. So you should make sure you use some function that would return a non-null value, such as coalesce()
that would return the first non-null parameter from left to right and force a default value as the right-most parameter.
|| coalesce(G.run_date, '')::text