Proposed change: Use an opaque struct instead of `[u8]` for `Mysql::RawValue` and `Pg::RawValue`


#1

Description of the change

  • <Mysql as Backend>::RawValue has been changed from [u8] to MysqlValue. You can access the underlying bytes by calling .bytes() or .as_ref().
  • <Pg as Backend>::RawValue has been changed from [u8] to PgValue. You can access the underlying bytes by calling .bytes() or .as_ref().

Why is this needed?

The first part of this is future proofing. For both PG and MySQL, we’ve seen cases where it would be useful to include a type identifier along with the bytes. This is more pressing for MySQL, but for both cases this is to ensure we can provide additional information to FromSql implementations in the future.

For MySQL, this will allow us to decouple from libmysqlclient. Currently we rely on libmysqlclient handling type coercion for us more than we intended to. A big reason for this is that MySQL has different semantics for many operations than other backends (e.g. 32 bit integer addition returns a 64 bit integer).

This results in the behavior of sql_query differing from using the normal query builder, as we don’t find out the expected type until after libmysqlclient has finished doing its thing. Instead we should make deserializing numbers on MySQL more permissive, and work with an enum for every possible numeric representation.

For PG, the motivation is less directly needed. However, it would be nice to provide the OID in the future both for strict type checking, and to allow some sort of handling of dynamic values.

Why can’t this have a deprecation cycle?

Since this affects the concrete value of an associated type, the only way we could deprecate this is by introducing a Mysql2 backend and Pg2 backend, deprecating the originals. This does the same thing is releasing Diesel 2.0, but it would break all code using mysql or PG, instead of just the code affected by this change.

Expected user impact

Code which is affected by this will take one of these three forms:

impl FromSql<Mysql> for MyType {
    fn from_sql(bytes: Option<&[u8]>) -> deserialize::Result<Self> {
        // ...
    }
}

impl FromSql<Pg> for MyType {
    fn from_sql(bytes: Option<&[u8]>) -> deserialize::Result<Self> {
        // ...
    }
}

impl<DB: Backend<RawValue = [u8]> FromSql<DB> for MyType {
    fn from_sql(bytes: Option<&[u8]>) -> deserialize::Result<Self> {
        // ...
    }
}

For MySQL, the expected impact is low. Since that backend has a fairly defined set of types with no options for extension, there is little reason to write a custom FromSql implementation. There may be some users doing this for enums, but support for those can be done without relying on implementation details already.

For PG the expected impact is moderate. Applications are likely only affected if they use PG enums (crates.io is impacted in one place). Any crate which adds support for new PG types will be affected, such as hstore or postgis. diesel_full_text_search would not be impacted.

It is unlikely there is any code using DB: Backend<RawValue = [u8]> outside of Diesel itself.

Migration strategy

On MySQL, the argument type will need to be changed to Option<&MysqlValue>. The function body should not be affected, since any FromSql implementation will likely be defined in terms of existing implementations.

On PG, the argument type will need to be changed to Option<&PgValue>. Any usage of the bytes in the function body that actually expects a &[u8] (e.g. doing something other than passing it to FromSql::from_sql for another type) will need to either call .bytes or .as_ref.

Any implementations for DB: Backend<RawValue = [u8]> will need to be changed to DB: Backend, Backend::RawValue: AsRef<u8>. The function body will need to use .as_ref in all places the bytes were being used.

In all cases, code can support both Diesel 1.0 and 2.0 by using .as_ref, since [u8]: AsRef<[u8]>.


#2

It actually looks like we won’t be able to change implementations for Backend<RawValue = [u8]> to Backend::RawValue: AsRef<[u8]>. For some reason it ends up conflicting with SQLite


#3

I’ve started migrating to this opaque struct for my own uses, and I found everything works perfectly until we get to PgResult. In PgResult::get we have this code:

    pub fn get(&self, row_idx: usize, col_idx: usize) -> Option<&PgValue> {
        if self.is_null(row_idx, col_idx) {
            None
        } else {
            let row_idx = row_idx as libc::c_int;
            let col_idx = col_idx as libc::c_int;
            unsafe {
                let value_ptr =
                    PQgetvalue(self.internal_result.as_ptr(), row_idx, col_idx) as *const u8;
                let num_bytes = PQgetlength(self.internal_result.as_ptr(), row_idx, col_idx);
                Some(&PgValue {
                    data: slice::from_raw_parts(value_ptr, num_bytes as usize).to_vec(),
                    oid: 1, //TODO
                })
            }
        }
    }

however, the PgValue obviously doesn’t live long enough. Unless there is a better solution, would a good alternative be to add the PgValue to the PgResult as an intermediate value in the struct?

pub struct PgResult {
    internal_result: RawResult,
    intermediate_value: PgValue,
}

Which would change as the result is read?

Otherwise I guess we could construct a new PgValue at the Row impl and give it Copy/Clone so it gets copied upstream?